Novartis / scar

scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics
https://scar-tutorials.readthedocs.io/en/main/
50 stars 5 forks source link

Stochastic rounding to integers for downstream use in TotalVI/SCVI #47

Closed mdmanurung closed 2 years ago

mdmanurung commented 2 years ago

Hi Caibin,

I tried using scar's output as input for TotalVI/SCVI. As expected, those gave an error because the input is not integer anymore. I would suggest implementing stochastic rounding to integers as done in SoupX.

Let me know if you're interested and I can find the time to implement it.

Regards, Mikhael

CaibinSh commented 2 years ago

Hi @mdmanurung ,

Thanks very much for your feedback.

No, we haven't tried combining scar with TotalVI/scVI yet. It is good to know that they require integer as input.

Sure, it is a great to implement stochastic rounding to integers, and I am really thankful for your offer. Let's do it together. I am going to create a branch for this.

Best, Caibin

CaibinSh commented 2 years ago

As a start, we probably could add a parameter 'rounding' in the following lines:

  1. https://github.com/Novartis/scar/blob/47-stochastic-rounding/scar/main/_vae.py: line 97
  2. https://github.com/Novartis/scar/blob/47-stochastic-rounding/scar/main/_scar.py: line 532

And then, we can use a if loop before line 116 and 117 in the file of https://github.com/Novartis/scar/blob/47-stochastic-rounding/scar/main/_vae.py.

Please feel free to tell me what you think.

Best, Caibin

mdmanurung commented 2 years ago

Hi Caibin, I made PR #48.

Please feel free to adapt it to your code style.

mdmanurung commented 2 years ago

Closing because I saw that the merging is in the works now.

Any further plans for the package? I use it quite often these days and would love to contribute more.

CaibinSh commented 2 years ago

Hi @mdmanurung ,

Yes, I already made a new release.

I am more than happy to welcome contributors. There are definitely many things on the list.

E.g., automating the calculation of ambient profile. This will make it easier for new comers to use scAR.

My idea is to leverage existing methods to define the subsets of cell-free droplets and quantify the ambient profile. I found the idea of DropletQC https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02547-0 very brilliant. But I haven't tested it yet. It would be great if somebody could re-implement it in Python and integrated it in scAR. Please tell me what you think.

Best, Caibin

mdmanurung commented 2 years ago

That's a great step forward. IIRC, DropletQC requires spliced/unspliced counts. I would suspect that this is a rather non-standard worfklow. The users would have to re-run the counting with e.g. velocyto. That can be non-trivial! But if such matrices are already available, then the calculation would be quite straightforward.

Perhaps cellbender's idea can be adapted here. So the users can feed the unfiltered matrix and then the algorithm will probabilistically identify the empty droplets, followed by decontamination with the scAR. That being said, this option would be non-trivial for the coders.

mdmanurung commented 2 years ago

Oh wait, I was late to realize that they have the nuclear_fraction_tags function.

EDIT: after a brief tour through their codebase, I would say that making an rpy2 wrapper would be more feasible.

CaibinSh commented 2 years ago

Hi @mdmanurung , sorry for the late response. Last month was a busy month for me due to the job interviews.

I agree with you on the DropletQC, after researching, I also found that the BAM files might be required, which may not be very convenient in some cases. In addition, this may put up a barrier in the application of scAR to snRNAseq.

I found the idea of EmptyDrops fits the ambient signal hypothesis well. It tests droplets with multinomial distribution over genes (with ambient profile as the prob) to distinguish cells and empty droplets.

I have incorporated a method called setup_anndata to facilitate the calculation of ambient profile. Please also see a tutorial here. Happy to hear your feedback.

Best, Caibin

mdmanurung commented 2 years ago

Sorry for the late response, @CaibinSh! Everything looks great, including the revamped documentation pages. I hope more and more people are using this awesome module.

Regards, Mikhael