SciLifeLab / facs

Fast and Accurate Classification of Sequences using Bloom filters
http://facs.scilifelab.se/
Other
16 stars 9 forks source link

Deconseq does not report filter and returns incorrect contamination rate #95

Open brainstorm opened 10 years ago

brainstorm commented 10 years ago

Contamination rate should never be >100...

{
   "_id": "e206960e0662df946a20e43b4f000c36",
   "_rev": "1-15137438404e50191513c6ee194c92e4",
   "sample": "tests/data/synthetic_fastq/simngs_phiX_1000000.fastq",
   "contamination_rate": 1301,
   "start_timestamp": "2013-12-02 12:08:17.599624Z",
   "total_reads": 500000,
   "end_timestamp": "2013-12-02 12:15:26.696080Z"
}
brainstorm commented 10 years ago

Seems like an artifact during development of the test for deconseq, new reported results do not exceed 100%.

An underlying problem for deconseq not reporting meaningful contamination rates right now might be that the tests do not transform the reference data as advised in the manual:

http://deconseq.sourceforge.net/manual.html

@guillermo-carrasco I would give prio on benchmarking FACS vs FastqScreen, I'll try to fix this one but right now I believe we can plot some meaningful results with the performance.py script.

Output reporting the reference has been fixed in 3a8cec1.

guillermo-carrasco commented 10 years ago

Agree, prio to FastqScreen :+1: