BinPro / CONCOCT

Clustering cONtigs with COverage and ComposiTion
Other
122 stars 48 forks source link

Using cut_up_fasta.py results in output file with many small nodes #192

Closed franciscozorrilla closed 6 years ago

franciscozorrilla commented 6 years ago

When I run the provided code:

cd $CONCOCT_EXAMPLE python $CONCOCT/scripts/cut_up_fasta.py -c 10000 -o 0 -m contigs/velvet_71.fa > contigs/velvet_71_c10K.fa

I obtain the file velvet_71_c10k.fa, but it contains some very small nodes:

NODE_10_length_186_cov_4.327957 GCCGACAGGCAAAACATCCATTTCTTTTTCATCTTATTTAAATTGGGTTTATAATCCATCGCTTTATTATTTATAGAGTATCCGCACATTTTCGCTGTTGCACCATATCCGGGGAGTCTGCCCTTTTTTCAGCCAAATCTCTTCCAACTTGGGATTACCCATAAAGGAATTGAGCGATTCCAGCATCGGACACCCCGACACATCGACAATCCGCAGATCGAGGTTGTTCCATTGCGATGACATGTCGCTATAAACg NODE_11_length_153_cov_12.470589 GTAACCTCACGGTCGATCATCGGCGGGGTGACTTTCGCATAGATTTGAGTGGTGGTGATATTTTTATGTCCCAGCATCTTCGAAAGCGTCTCCAGACTCACGCCGTTCGACAGGCAGATCGTCGTGGCGTACGTATGCCGGGCAAGGTGGAACGAGAGTTGTTTGTCAATGCCGCACTCCCGGGCGATGTTTTTCAGGCTGTCGTCGATCGAATCGAGTGTAG NODE_12_length_555_cov_14.255856 ACCGCTACTATAAATCCATCGACCATCTGCGGACTTTCATGCGTAAGGAGTATAACGTGAGCGATATGCCGTTGGCGGAGTTGGAACAGTCGTTCATCGAGCAATACCACGTCTACCTTAAATCCGATCTGGGGCTCAAGCCTACGACCGTCAGCGGTTATCTCAAATGCCTGAAATACGTTGTCAAAATCGCGTTCAACAACGGCTGGATGCCTCGCAACCCCTTTTCCCTCTATCAATATACGGCTCCGAATCCGGAACGTAGTTTTTTAACGGAAGATGAACTCCGGCGTATGATGACTACCGAGCTGCGGTATAAGCGTCAGGACTATAACCGCGATATGTTCCTGTTCTCCTGCTTTACGGGCATCTGCTATGCGGATATGGCCTCGCTGACCTATGACCGGATCGAGCAGGATGCGCAGGGCGAGTGGTGGATCAGCGGCAACCGCCAGAAGACCGAAACCAAATACGTTGTCAAGTTGCTGCCTTATGCGTTGTTCATCCTGAACAAGTATCGGGGTCTGACCGGCGACGGACGTGTTTTTGCCATGTCTACACTCGATTCGATCGACGACAGCCTGAAAAACATCGCCCGGGAGTGCGGCATTGACAAACAACTCTC

Is this supposed to happen? I understood that I should be obtaining nodes of approximately 10kb.

alneberg commented 6 years ago

Hi @franciscozorrilla,

the cut_up_fasta.py script does not filter away any short contigs, and it does certainly not merge any short contigs into longer ones. It only cuts up the long ones (> 20Kb) into pieces of 10Kb + 1 piece 10-20Kb. The short ones you see are the same as are output from the velvet assembly and I think it is perfectly fine.

franciscozorrilla commented 6 years ago

Hi @alneberg, thank you for your quick response.

Ah I understand now. I think it was this sentence from the readthedocs page that made me confused about the presence of small chunks: "The final chunk is appended to the one before it if it is < 10 Kb to prevent generating small contigs". But now I see that small chunks will only be appended if they belong to the same contig! Thanks again.