compbiomed / singleCellTK

Interactively analyze single cell genomic data
https://camplab.net/sctk/
Other
170 stars 74 forks source link

Add support for alevin-fry #579

Open rob-p opened 2 years ago

rob-p commented 2 years ago

Hi,

Thank you for developing this useful tool. I just came across the paper in Nature Communications; congratulations! My lab develops the alevin tool, which is currently supported in singleCellTK. We have been working for some time on a successor to alevin called alevin-fry, and the published version of the corresponding paper has (very) recently appeared.

It would be great to have support in singleCellTK for alevin-fry. Given that it reports base-level quality control metrics in a way that should be easy to parse and consume, this would hopefully be straightforward. Further, myself and some of the other alevin-fry developers (namely the lead author @DongzeHe) would be happy to help implement support for alevin-fry in singleCellTK. Is there any developer documentation, or guidance on how to write a module for singleCellTK to allow input from a new pre-processing tool? Thanks, and congrats again on the paper and tool!

Best, Rob

joshua-d-campbell commented 2 years ago

Hi @rob-p, thanks for reaching out and congrats on the new version of Alevin and the corresponding manuscript! We would be happy and excited to work with you on this. Our current function for importing Alevin data which can be used as a starting template is here. We will try to run alevin-fry to examine the output in more detail. If you can point to example output, that would also be helpful.

Where are the QC metrics output? Are they all cell-level QC metrics or do you have any sample-level (e.g. percentage of aligned reads)? In our recent release, we started importing sample-level QC metrics from the CellRanger and STARsolo output. They are stored in the metadata slot of the SCE so they can be plotted in the QC reports as well.

Josh

rob-p commented 2 years ago

Hi @joshua-d-campbell,

Thanks; I'm glad to hear this! Right now, the best place to look to see how to collect quality metrics from alevin-fry is probably @csoneson's AlevinQC package; specifically the .readAlevinFryQC_v0.5.0 function. This reads in pretty much all of the relevant QC info that is output by a full end-to-end run of salmon alevin -> alevin-fry generate-permit-list -> alevin-fry collate -> alevin-fry quant. The mapping directory will also contain some sample-level metrics in addition to the cell-level metrics output by fry during quantification. If you have a chance to take a look at what is there, let us know what you think the best way to proceed is.

Thanks! Rob

csoneson commented 2 years ago

Hello! This looks great, I'd be happy to make adaptations also to alevinQC if needed. Just a minor note - the top-level user-facing (exported) function for reading alevin-fry output would be readAlevinFryQC - it should automatically figure out which version of alevin-fry was run and call the appropriate low-level reading function. There's a corresponding file for reading alevin QC output as well (readAlevinQC).

DongzeHE commented 2 years ago

Hey! for importing alevin-fry data, we have provided the loadFry() R function in the fishpond R package. This function takes an output folder generated by the alevin-fry quant command as the required input and returns a SingleCellExperiment object like the one returned by the importAlevin() in singleCellTK. If alevin-fry is run in the USA(abbrv. for "Unspliced-Spliced-Ambiguous") mode, by which it infers an unspliced, a spliced, and an ambiguous count for each gene in each cell separately, this function can also prepare the desired output format for single-cell, single-nucleus, and RNA velocity analysis according to an optional argument outputFormat. If needed, I would also be happy to help with adapting the loadFry function to singleCellTK. Thanks!

rob-p commented 2 years ago

Hey! for importing alevin-fry data, we have provided the loadFry() R function in the fishpond R package. This function takes an output folder generated by the alevin-fry quant command as the required input and returns a SingleCellExperiment object like the one returned by the importAlevin() in singleCellTK. If alevin-fry is run in the USA(abbrv. for "Unspliced-Spliced-Ambiguous") mode, by which it infers an unspliced, a spliced, and an ambiguous count for each gene in each cell separately, this function can also prepare the desired output format for single-cell, single-nucleus, and RNA velocity analysis according to an optional argument outputFormat. If needed, I would also be happy to help with adapting the loadFry function to singleCellTK. Thanks!

Thanks @DongzeHE — but the use case here is for getting at the QC metrics more than just the final quantifications, so it will need access to some of the extra files that AlevinQC uses.