atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
66 stars 19 forks source link

Datatset with biological replicates #83

Closed joeyhuang401055 closed 2 years ago

joeyhuang401055 commented 2 years ago

Hi Mr. Tarashansky,

Thanks for developing SAMap. This is indeed a useful and surely impactful tool for studying Evo-Devo with scRNA-seq datasets. I am wondering how to apply SAMap to datasets with biological replicates. The data I am using contain 6 samples (2 technical replicates and 4 biological replicates) and therefore 6 matrices. In your tutorial, it's suggested to input the raw matrix into SAM and SAMap, but I noticed that directly inputting the 6 matrices would result in significant batch effect. So I would like to ask your opinions on the following:

  1. All of my scRNA-seq processing was done in Seurat. Can I use the Seurat-integrated data as input? The downside of doing so is that this would only retain 2000 highly variable genes, instead of all genes expressed in the dataset. I am aware that SAMap analysis would consider all genes, unlike other regular scRNA-seq analysis that take only highly variable genes. So I'm not sure if this is reasonable or acceptible.
  2. I found in your old SAM tutorials that you used mnnpy to correct for the batch effect between Schistosoma 2.5 and 3.5 week datasets. Would this be your suggested way for integrating datasets before loading into SAMap?

Thanks in advance. I look forward to hearing your thoughts.

Tzu-Yi Huang

joeyhuang401055 commented 2 years ago

BTW, my queston is similar to #61 and I understood that there's no need to run sam.preprocss_data() when loading Seurat-integrated datasets.

atarashansky commented 2 years ago

Hi Joey - this is still experimental and something I haven't tested thoroughly, but the latest version of SAM offers native batch correction using Harmony. There's a new parameter to sam.run called batch_key - set it to the sam.adata.obs column which contains your batch variable.

So your flow should be: 1) Concatenate all your (normalized and log-transformed) data into one AnnData (let's say it's called adata). 2) Load it into SAM: sam=SAM(counts=adata) 3) Run SAM: sam.run(batch_key="batch") 4) Let's say you did this for both species, sam1 and sam2 - input those into SAMAP:

e.g. for human and mouse with species identifiers hu and mo:

sams={'hu': sam1, 'mo': sam2}
sm = SAMAP(sams, ...other_args)
joeyhuang401055 commented 2 years ago

Hi Mr. Tarashansky,

Thank you so much for your quick reply! The native batch correction was pretty helpful!

Tzu-Yi Huang (Joey)

dsb66 commented 1 year ago

I am running the current version of SAMap (v1.0.15). I am trying to use the batch option with the command sm.run(batch_key="batch") but I get the error: TypeError: run() got an unexpected keyword argument 'batch_key'. Has this option been removed from the current version?