biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

Check before a version update #68

Closed hxin closed 6 years ago

hxin commented 6 years ago

This issue is used to track the performance changes since version 1.1.2, in turns of accuracy and runtime.

I used the hmr dataset and ran sargasso with --best/--conservative using the following three versions: f2043d05d3683fe1962e57d9e363ba8717e1ed64|beforeXin 44798524024ff64b843e7299b36cfa541b01b327|master 462e27bea81f0b18967e7d41b619cb7ef6d5565a|masterWithIntron

best conservative
beforeXin 7:06:19 5:40:16
master 7:07:10 5:50:39
masterWithIntron 7:07:58 6:00:58

master-beforexin master-beforeXin.txt


master-masterwithintron master-masterWithIntron.txt


masterwithintron-beforexin masterWithIntron-beforeXin.txt

hxin commented 6 years ago

The results seem fine to me. The new code is slightly slower than before. We might also want to dig out some of the FPs to see why they are incorrectly assigned. Please let me know what you think @lweasel

lweasel commented 6 years ago

Yes, all looks good to me. Agree that it is worth having a look at some of the FPs.

hxin commented 6 years ago

After looking at some FPs, we concluded that the misassignment was caused by the cigar string, in which case, the incorrect species has a better cigar pattern compared to the correct species. There is not much we can do at the moment. The future release of better reference genome may solve this problem.