SciLifeLab / facs

Fast and Accurate Classification of Sequences using Bloom filters
http://facs.scilifelab.se/
Other
16 stars 9 forks source link

Adding an ipython notebook to explore the benchmarks data #97

Closed brainstorm closed 10 years ago

brainstorm commented 10 years ago

While the FACS metrics look okey, there's a significant issue with fastq_screen. The mean for contam_fqscr in frame.describe() shows 19... it should never surpass 1, since it is a contamination rate!

The reason for that huge deviation seems to be exclusive to dm3, data for the other organisms seems correct :/

Please have a look at my ipython notebook below for futher details on my data analysis (plots at the end).

http://nbviewer.ipython.org/github/brainstorm/facs/blob/d844a6ab/facs/utils/benchmarks_facs.ipynb

Next step: figure out why fastq_screen reports 56,05 as a contamination rate for dm3 :-/

@arvestad, @guillermo-carrasco, @tzcoolman #reviewthis

henrikstranneheim commented 10 years ago

Looks great! I feel a bit out of touch with the project, but now I am back full time so it will be fun to check out the latest updates.

Cheers Henrik

Skickat från min iPad

12 jan 2014 kl. 17:13 skrev Roman Valls Guimerà notifications@github.com:

While the FACS metrics look okey, there's a significant issue with fastq_screen. The mean for contam_fqscr in frame.describe() shows 19... it should never surpass 1, since it is a contamination rate!

Please have a look at my ipython notebook below for futher details on my data analysis (plots at the end).

http://nbviewer.ipython.org/github/brainstorm/facs/blob/d844a6ab/facs/utils/benchmarks_facs.ipynb

Next step: figure out why fastq_screen reports 56,05 as a contamination rate for dm3 :-/

@arvestad, @guillermo-carrasco, @tzcoolman #reviewthis

You can merge this Pull Request by running

git pull https://github.com/brainstorm/facs master Or view, comment on, or merge it at:

https://github.com/SciLifeLab/facs/pull/97

Commit Summary

Adding an ipython notebook to explore the benchmarking/accuracy data interactively, @arvestad, @guillermo-carrasco, @tzcoolman #reviewthis File Changes

A facs/utils/benchmarks_facs.ipynb (274) M facs/utils/performance.py (40) Patch Links:

https://github.com/SciLifeLab/facs/pull/97.patch https://github.com/SciLifeLab/facs/pull/97.diff — Reply to this email directly or view it on GitHub.

guillermo-carrasco commented 10 years ago

Hi @brainstorm ,

Awesome work! The ipython notebook looks ok, though there are some results that I don't understand. One of them is the one you mentions, for which I cannot find an easy explanation, I'll take a look more thoroughly. The other one is that, in some other cases, you get contamination rates totally opposite in fastq_screen and FACS, i.e on the first table you show, the second row (labeled 14), the contamination for FACS is 1.0 and for fastq_screen is 0.0. How can it be? Am I reading it wrongly? I guess that the correct one should be fastq_screen, as you're querying against phiX on a phiX sample.

brainstorm commented 10 years ago

@guillermo-carrasco, thanks! :)

Yes, I was a bit puzzled by those results too... We should determine whether it is a problem with the tests or with fastq_screen itself.

Can you run a quick test for me at scilifelab? Can you:

1) Run fastq_screen manually with the .conf parameters from the production fastq_screen at scilifelab... with the phiX sample/reference. 2) Run fastq_screen manually from the parameters the testsuite generates... with the same phiX sample/reference. 3) Merge this pullrequest anyway ;)

Thanks!

brainstorm commented 10 years ago

@guillermo-carrasco I think I know what might be happening here... (see "organism" record):

http://facs.iriscouch.com/_utils/document.html?fastq_screen/23ac213295941c15785deca5e90044af

%Unmapped                            4.44
%One_hit_one_library                      60
%One_hit_multiple_libraries              0
%Multiple_hits_one_library        35.56
Library                                 dm3

I think that in the code I'm not parsing and calculating/aggregating the %*library attributes properly... can someone double-check this?