Open grst opened 1 year ago
This is a really nice idea...
I would be open to community contributions on this! :)
It might be the case that you could make such a report based on the individual metrics.csv
files from each sample. That is at least what I intended as the use for those metrics files. If they lack information that would be important for creating such a report, it might be a good idea to include additional information in the metrics.csv
Hi @sjfleming,
I'm not promising anything, but if it turns out that we end up using cellbender as a routine step in our single-cell processing pipeline, I might be able to justify spending time on this.
In any case, they metrics.csv
is a good start, but what I'd really like to have the the reports as well is the training plot for each sample
and potentially a simplified version of the cell probability plot:
I don't think these information are currently made available in any machine-readable format. MultiQC reports usually also contain an overview of QC warnings/failures per sample. It would be nice to not have to parse them from the HTML report.
I'm not sure what would be the natural way to include them in a metrics.csv
... Maybe a metrics.json
or metrics.yml
would be more natural?
{
"total_raw_counts": 10121979,
[...],
"training_progress": {
"train": {
"x": [0, 10, 15, 27 ... ],
"y": [...]
},
"test": { ... },
},
"cell_probability": {
"x": [...],
"y": [...],
},
"warnings": [
{
"id": "learning_curve_didnt_converge",
"description": "The learning curve didn't converge... Please check ..."
}
]
}
I would certainly be open to the idea of a metrics.json
It would be great to have a multiqc module for cellbender that aggregates plots and metrics from individual cellbender runs into a single HTML report.
This would be particularly useful when processing many samples: Instead of looking at each HTML report individually, one can quickly spot those where cellbender didn't converge or failed in any other regard.