Closed danmoore1987 closed 1 month ago
Hello @danmoore1987, why not, would you have any specific example to link here so that I can take a look?
Hi @Hoohm, Thanks for the reply!
Essentially some of the cell ranger outputs. So one to measure per barcode, how many UMI's were assigned.
The second is this. Its so we know if we need to sequence the CITE library deeper.
Cell ranger uses this formula for it: Sequencing Saturation = 1 - (n_deduped_reads / n_reads) where: n_deduped_reads = Number of unique (valid cell-barcode, valid UMI, gene) combinations among confidently mapped reads. n_reads = Total number of confidently mapped, valid cell-barcode, valid UMI reads.
Cheers!
Just want to echo: this would be quite useful to know. we're actively trying to sort out whether some of our lesser libraries would benefit from more sequence depth.
I believe the run_report gives data to calculate saturation, at least globally, right? I think it would be valid to use 'Reads processed' as n_reads (we could adjust by percentage mapped?), and 'UMIs corrected' as n_deduped_reads, correct?
The per-cell plot above is informative. Presumably one could read/merge the 'read_count' folder and 'umi_count' folders to accomplish this, right?
I need to sanity check the data, but this is derived from combining umi_count and read_count folders:
the R code is here: https://github.com/BimberLab/cellhashR/blob/2211878b792d7c0c5ff48e4183cdcd7a44dec8b8/R/Preprocessing.R#L278
This is great @bbimber !
I also checked out the rest of your cellhashR package for post-processing QC of libraries. Can't wait to give it a go! :)
@danmoore1987 yes, i'm still surprised there arent more tools that exist that do what we're trying in cellhashR. we'd welcome any feedback. part of my goal is cellhashR is to specifically compare across different calling algorithms, since we find some do better or worse with different inputs.
With respect to saturation in particular, it would be great if you could confirm the tool is giving you believable values. I was surprised how non-saturated our libraries often were, but this wasnt something I had been tracking.
Ok, folks, I'm on holiday!!!
Let me take a look since I'm gonna work on this damn 1.5 release!!!
I'll keep you posted :)
Thanks for the code!
@Hoohm No worries - I actually think we implemented this in cellhashR; however, I'd love to figure out features that make this work synergistically with Cite-Seq-Count.
Yes! That would be amazing. Can you send me an email so we can have a quick chat these days maybe?
Ok, so 1.5.0 is nearly finished. Running some tests on datasets to see how it matches the older version.
For your specific needs here is a non exhaustive list of changes that affects your code:
Read10X
runs by default on the right column (gene.colunm=2)I think these are the only ones affecting your code, but I might be missing something. Let me know :)
@Hoohm is there a heuristic code can perform to determine what format of input it's getting? for example, if we have a function for processCiteSeqCount(outputFolder), can this code automatically figure out what format it was passed?
Not sure which format you are referring to.
If you are talking about the translated version, then yes, the barcodes.tsv will hold two columns instead of two.
On Wed, 30 Dec 2020, 15:14 bbimber, notifications@github.com wrote:
@Hoohm https://github.com/Hoohm is there a heuristic code can perform to determine what format of input it's getting? for example, if we have a function for processCiteSeqCount(outputFolder), can this code automatically figure out what format it was passed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/81#issuecomment-752636189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2DYYSL4TRCSO2FPIOTSXMYTLANCNFSM4IZF55AA .
Maybe I misunderstood, but in your prior post didnt you say the MTX format is changing in version 1.5.0? Ideally, I would like cellhashR::ProcessCountMatrix() to just work with either the output from Cite-Seq-Count 1.5.0 or prior versions. I suppose I could read the matrix into memory with gene.column=1, test for the presence of 'unmapped', and if it's not present re-read using gene.column=2?
It's not changing that much.
I would really love to have a chat on zoom with you, would be interesting to have a back and forth about this since I'm not completely fixed on everything.
On Wed, 30 Dec 2020, 18:07 bbimber, notifications@github.com wrote:
Maybe I misunderstood, but in your prior post didnt you say the MTX format is changing in version 1.5.0? Ideally, I would like cellhashR::ProcessCountMatrix() to just work with either the output from Cite-Seq-Count 1.5.0 or prior versions. I suppose I could read the matrix into memory with gene.column=1, test for the presence of 'unmapped', and if it's not present re-read using gene.column=2?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/81#issuecomment-752692652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2BBFOXOJI2NZHIWM7LSXNM5NANCNFSM4IZF55AA .
sure - would be happy to. i didnt realize you worked at 10x until I googled your name just now. my email is bimber@ohsu.edu
Closing this for now.
Hi @Hoohm , Thank you again for the easy to use package!
Just a suggested enhancement feature that i think a lot of people might be interested in. Would be cool if in the CITE count report workflow you also generate a barcode rank and UMI saturation index plots/csv! :)