Closed mdmanurung closed 2 years ago
Hi @mdmanurung ,
Thanks very much for your feedback.
No, we haven't tried combining scar with TotalVI/scVI yet. It is good to know that they require integer as input.
Sure, it is a great to implement stochastic rounding to integers, and I am really thankful for your offer. Let's do it together. I am going to create a branch for this.
Best, Caibin
As a start, we probably could add a parameter 'rounding' in the following lines:
And then, we can use a if loop before line 116 and 117 in the file of https://github.com/Novartis/scar/blob/47-stochastic-rounding/scar/main/_vae.py.
Please feel free to tell me what you think.
Best, Caibin
Hi Caibin, I made PR #48.
Please feel free to adapt it to your code style.
Closing because I saw that the merging is in the works now.
Any further plans for the package? I use it quite often these days and would love to contribute more.
Hi @mdmanurung ,
Yes, I already made a new release.
I am more than happy to welcome contributors. There are definitely many things on the list.
E.g., automating the calculation of ambient profile. This will make it easier for new comers to use scAR.
My idea is to leverage existing methods to define the subsets of cell-free droplets and quantify the ambient profile. I found the idea of DropletQC https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02547-0 very brilliant. But I haven't tested it yet. It would be great if somebody could re-implement it in Python and integrated it in scAR. Please tell me what you think.
Best, Caibin
That's a great step forward. IIRC, DropletQC requires spliced/unspliced counts. I would suspect that this is a rather non-standard worfklow. The users would have to re-run the counting with e.g. velocyto. That can be non-trivial! But if such matrices are already available, then the calculation would be quite straightforward.
Perhaps cellbender's idea can be adapted here. So the users can feed the unfiltered matrix and then the algorithm will probabilistically identify the empty droplets, followed by decontamination with the scAR. That being said, this option would be non-trivial for the coders.
Oh wait, I was late to realize that they have the nuclear_fraction_tags function.
EDIT: after a brief tour through their codebase, I would say that making an rpy2 wrapper would be more feasible.
Hi @mdmanurung , sorry for the late response. Last month was a busy month for me due to the job interviews.
I agree with you on the DropletQC, after researching, I also found that the BAM files might be required, which may not be very convenient in some cases. In addition, this may put up a barrier in the application of scAR to snRNAseq.
I found the idea of EmptyDrops fits the ambient signal hypothesis well. It tests droplets with multinomial distribution over genes (with ambient profile as the prob) to distinguish cells and empty droplets.
I have incorporated a method called setup_anndata
to facilitate the calculation of ambient profile. Please also see a tutorial here. Happy to hear your feedback.
Best, Caibin
Sorry for the late response, @CaibinSh! Everything looks great, including the revamped documentation pages. I hope more and more people are using this awesome module.
Regards, Mikhael
Hi Caibin,
I tried using
scar
's output as input for TotalVI/SCVI. As expected, those gave an error because the input is not integer anymore. I would suggest implementing stochastic rounding to integers as done in SoupX.Let me know if you're interested and I can find the time to implement it.
Regards, Mikhael