bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

Code deduplication #25

Open loic-couderc opened 7 years ago

loic-couderc commented 7 years ago

The read_fasta_file_handle function is duplicated in many locations:

scripts/compute_assembly_stats.py
scripts/compute_lca_from_tab.py
scripts/compute_pairwise_distance_matrix.py
scripts/compute_ref_coverage_histogram.py
scripts/exonerate_to_sam.py
scripts/extract_taxo_from_fasta.py
scripts/fasta_clean_name.py
scripts/fasta_get_lengths.py
scripts/fasta_length_filter.py
scripts/fasta_name_filter.py
scripts/filter_sam_by_coverage.py
scripts/filter_sam_by_pid.py
scripts/get_HMP_OTU_psn.py
scripts/matam_assembly.py
scripts/remove_redundant_sequences.py
scripts/replace_Ns_by_As.py
scripts/replace_Ns_by_rand_nu.py
scripts/sort_fasta_by_length.py

The issue #8 introduce a new module (scripts/fasta_utils.py). Use this to deduplicate the code.

The scripts/compute_abundance.py and scripts/krona.py use the one from fasta_clean_name. Replace the import.

Some other function may be to replace as well ( read_fastq_file_handle, format_seq...).