COMBINE-lab / alevin-fry

🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
https://alevin-fry.readthedocs.io
BSD 3-Clause "New" or "Revised" License
166 stars 15 forks source link

Update documentation to include recommended processing for 10x scRNA 5' V2 #118

Open jeremymsimon opened 1 year ago

jeremymsimon commented 1 year ago

Hey @rob-p and @Gaura - I've started working with 10x 5' V2 data, and wanted to utilize alevin-fry for processing of the raw data. I initially found some discussions from @k3yavi on the alevin repo, describing how the only change needed to handle this different data type was to switch alevin's -l ISR to -l ISF (e.g. here). However, I ended up getting far fewer cells detected for each sample, and the results didn't seem to make sense nearly as much as the same data processed via cellranger.

Digging deeper, I discovered a very fruitful and helpful discussion here, where effectively the conclusion was to run alevin with -l ISR as normal, but switch alevin-fry generate-permit-list's expected orientation from -d fw to -d rc. This made a huge difference both in the example data analyzed by @allyhawkins here and in my own data; for me I detected almost 60% more cells that passed QC filters, and a totally different (and much more sensible) set of clusters and markers characterizing them after making this change.

For others interested in processing data of this type, and assuming it isn't somewhere already that I've missed, it might be helpful to elevate the cellranger vs alevin-fry comparison doc linked above to a polished vignette here and/or mention this more clearly on the main alevin-fry docs or within the generate-permit-list page. I'm curious as well regarding how this approach would/could get handled within the simpleaf and nf-core/scrnaseq frameworks.

Thanks as always!

crazyhottommy commented 1 year ago

hey @jeremymsimon @rob-p I was processing some 10x5' V2 data last week and the number of reads per cell is much fewer than the cellranger output. I then found this issue and indeed changing to -d rc made a difference.

Thanks! Tommy

rob-p commented 1 year ago

Thanks, @crazyhottommy! @DongzeHE : We should figure out the best way to add this information to the documentation — something like a table of protocols with notes or some such.

wmacnair commented 1 month ago

Just supporting previous comments that (1) changing to -d rc made a huge difference to my 5prime data 🥳, and (2) it would be great for users to have this clearer in the documentation.

From my user point of view, I actually think the highest priority for the alevin ecosystem should be a unified documentation page. Maybe something a bit like the scanpy docs, i.e. tutorials and API in one obvious and natural place (I think scvi also has nice docs).

A follow-up question on usage of -d/--expected-ori - are there ever circumstances where you would recommend using both? Assuming you know what the chemistry is. Thanks!