Closed gregdenay closed 2 months ago
Very nice. The only thing I still have on my radar is that issue with Cutadapt letting through reads that are clipped on only one side - which I suppose could account for some of the low-frequency noise?
Adding the benchmark metrics to the documentation is definitly a good idea.
Good point, is this something we want fixed for v1.0? It looks like low effort but I'm not sure the Report -> JSON -> MQC is that easy.
I'll add a doc issue for validation data so we don't forget
Well, we already have the module ready to go from before the change - so that is zero effort. But the JSON thing... no clue. I'll have a look next week to see if this is even feasible (not sure about the JSON contents).
Benchmarking using the FooDMe dataset published in Denay et al. 2023, after correction for Macropodideae (expected at familly level) and Dama dama (not detected by the Laboratory method), and two apparently switched samples (119 and 120):
We get at the genus level, and 0.1% cutoff a precision of 98,19% and recall of 99,08%, which I think is pretty neat. I don't see much use in spending more time on optimization at this point, maybe some adjustements will come for the user side.
Maybe we can add a small report on this to the doc later on.