bio-raum / FooDMe2

A nextflow pipeline for the identification of species from mixed samples based on mitochondrial amplicons
https://bio-raum.github.io/FooDMe2/
GNU General Public License v3.0
1 stars 1 forks source link

[Testing] Check benchmarking result for Dobrovolny Method and optimize #40

Closed gregdenay closed 2 months ago

gregdenay commented 2 months ago

Benchmarking using the FooDMe dataset published in Denay et al. 2023, after correction for Macropodideae (expected at familly level) and Dama dama (not detected by the Laboratory method), and two apparently switched samples (119 and 120):

We get at the genus level, and 0.1% cutoff a precision of 98,19% and recall of 99,08%, which I think is pretty neat. I don't see much use in spending more time on optimization at this point, maybe some adjustements will come for the user side.

Maybe we can add a small report on this to the doc later on.

marchoeppner commented 2 months ago

Very nice. The only thing I still have on my radar is that issue with Cutadapt letting through reads that are clipped on only one side - which I suppose could account for some of the low-frequency noise?

Adding the benchmark metrics to the documentation is definitly a good idea.

gregdenay commented 2 months ago

Good point, is this something we want fixed for v1.0? It looks like low effort but I'm not sure the Report -> JSON -> MQC is that easy.

I'll add a doc issue for validation data so we don't forget

marchoeppner commented 2 months ago

Well, we already have the module ready to go from before the change - so that is zero effort. But the JSON thing... no clue. I'll have a look next week to see if this is even feasible (not sure about the JSON contents).