ctSkennerton / crass

The CRISPR assembler
http://ctskennerton.github.io/crass
GNU General Public License v3.0
35 stars 11 forks source link

number of reads is different in output compared to internal representations #15

Closed ctSkennerton closed 13 years ago

ctSkennerton commented 13 years ago

Internal read numbers of groups are shown below: 1 : 313 3 : 428 4 : 138 5 : 102 6 : 229 8 : 447 10 : 274 11 : 421 13 : 13 14 : 24 15 : 364 18 : 14 22 : 301 24 : 32 27 : 173 31 : 4 32 : 25 43 : 85 44 : 84 45 : 64 49 : 16 50 : 7 56 : 10 57 : 29 71 : 129 72 : 18 74 : 36 94 : 15 112 : 9 117 : 118 126 : 11 152 : 3 161 : 17 173 : 3 215 : 4 270 : 11 302 : 11 305 : 5 453 : 2 483 : 3 501 : 3 574 : 7 714 : 4 1629 : 86 1630 : 6 1631 : 53

And now for the output fasta files:

$ grep -c '>' crass_test_out/*.fa crass_test_out/Group_10_GTTTTCCCCGCGCCAGCGGGGATAGGCCC.fa:258 crass_test_out/Group_117_GTTTCAATTCCAAAAGGTGCGATTAAAAG.fa:81 crass_test_out/Group_11_GTCAGTAACCAGCCCTGAAAACAAAGGGATTGAGAC.fa:344 crass_test_out/Group_126_TCGTCAGCCTCGTCAGCCTCGTCAGCCTCGTC.fa:7 crass_test_out/Group_13_CGTCTCATCCTGCTCTTCGGGGCGGGCTTCGTTGAAGG.fa:7 crass_test_out/Group_14_TGCCCTGATTTAAAAGGGATTAAGAC.fa:11 crass_test_out/Group_15_CGGACCATCCCCACGGGGGTGGGGAAAAC.fa:279 crass_test_out/Group_161_GTCATTTCGTCCCGCAGGACAAACTCGCGCCGGAC.fa:8 crass_test_out/Group_1629_GTCTCCCCCGCGCACGCGGGGATCGACCC.fa:51 crass_test_out/Group_1631_GTCTCCCCCGCGCACGCGGGGATCGACCT.fa:25 crass_test_out/Group_18_CACCAATTCTTGATCTTTGGTTGGGTTGGCACGGCAAATTGTCA.fa:6 crass_test_out/Group_1_GTCGCAATCCTCGCTATAATGGAATTGGGTGATTTAC.fa:212 crass_test_out/Group_22_GAGCAGCAAGAGCGGATCACCGAGCAGCAAGAGCGGATCAC.fa:269 crass_test_out/Group_24_GCATCAATCCACGACCCGCATCGAAGGGTACTGAAAC.fa:14 crass_test_out/Group_270_GAATTTTCTTATGATGTAGACGGGAACTTTTTAGAGTAGGTAG.fa:7 crass_test_out/Group_27_ATTCTCCCGGCTTATTTAGTCGGGAGTGGATTGAAAC.fa:97 crass_test_out/Group_302_ATTTCCAAAAACATCGACTCAAAGTGAGTACTGAAAC.fa:8 crass_test_out/Group_32_CGGAGACGGAGACGGAGACGGAGACGGAGACGGAG.fa:8 crass_test_out/Group_3_GTCCGTCAGACCTGCCCGTTGATAGGGCTTTGTGAC.fa:396 crass_test_out/Group_43_GCCGCAGTCACCGCGATTCCGAAGAGCTTGTGGCGG.fa:51 crass_test_out/Group_44_GCATCGCCCGGCCTCACGGTCGGGCGTGGATTGAAAC.fa:49 crass_test_out/Group_45_GAGTGTAGCTATCCGGGGTGAGAGAGGGAGCTACAAC.fa:46 crass_test_out/Group_49_GTGCTCAACGCCTTCCGGCATCCCCGCCGATCACCACC.fa:0 crass_test_out/Group_4_GTTTCAATCCGCGCCGCCCTCACGGGCGGCGAC.fa:80 crass_test_out/Group_56_ATCATAGCATAACCGTGTCGGATGGGAAGCTATGAC.fa:6 crass_test_out/Group_57_AATGTAAAAGAACAATACTTGATGCCAAATTACAAC.fa:22 crass_test_out/Group_5_GTTTCAATTCCTCAATGGTACGATTATTAC.fa:68 crass_test_out/Group_6_CGGTTCATCCCCGCGGCTGCGGGGAACGC.fa:184 crass_test_out/Group_71_GTTACGAATCCCCATGCGGGGTTATGAG.fa:100 crass_test_out/Group_72_TTCAATCCTCGCTCCCCGTCGCCGGGGAGCGTGT.fa:5 crass_test_out/Group_74_ATTCACTGCCGGATAGGCAGCTCAGAAA.fa:26 crass_test_out/Group_8_GTCAGTAACCAGCCCTGAAAACAAAGGGATTGAGAC.fa:344 crass_test_out/Group_94_GTTTCAGTATCCGCAACCGGATCGAATTTTTTGTGAC.fa:5

ctSkennerton commented 13 years ago

I think this may be due to the print reads function in NodeManager which has the option to show detached nodes or not. Therefore if the read contains only detached nodes then it will not be printed under the default conditions. This brings up another issue in that NodeManagers should be checked (and deleted) if they lack any attached nodes or if the number of attached nodes is low