carmonalab / ProjecTILs

Interpretation of cell states using reference single-cell maps
GNU General Public License v3.0
234 stars 27 forks source link

merge two references? #25

Closed SeBaBInf closed 8 months ago

SeBaBInf commented 2 years ago

hi there

I was wondering if it's possible to merge two (or more) references to be used against the query dataset.

thank you

mass-a commented 2 years ago

Hello! it's not trivial how one could merge multiple references, as they are defined in different PCA/UMAP spaces and in turn by different genes.

Normally we would recommend projecting your query dataset into multiple references separately. For instance, if you have data for T cells in viral infection, I would first isolate the CD8 T cells and project them in the virus-specific CD8 T cell atlas, then isolate the CD4 T cells and project them into the CD4 T cell atlas. Does that make sense?

Best, -m

SeBaBInf commented 2 years ago

hi Massimo

thank you for the quick reply, that's a very good idea, I didn't think about running them separately and then merging the results. thank you Seb

SBata commented 2 years ago

hi @mass-a , I did download all three murine references, however, while two work well, when I try to use ref_LCMV_CD4_mouse_release_v1.rds I get the following error:

> query.projected <- make.projection(sc_data, ref=ref)
[1] "Using assay RNA for query"
Pre-filtering cells with scGate...
Error in 1:nrow(scGate.table) : argument of length 0

that's the same approach I used for the other two, but in this case I can't figure out why that happens, any suggestion?

thank you!

mass-a commented 2 years ago

Hi, are you using the latest ProjecTILs (v2.0) and the latest CD4 atlas (version 2)?

SeBaBInf commented 2 years ago

sorry for the late reply. yes, I am using the latest CD4 atlas and laters projectTILs. when I use filter.cells=F, it actually runs. so, our dataset is mostly T cells, so is that what could have caused an error with scGate?

then this makes me think...if I take a mixed T cells, myeloid etc...reference for a mixed sample, could I then use filter.cells=F and use ProjectTILs to map my mixed data onto that reference? thank you!

mass-a commented 2 years ago

Hi again, it's possible that you don't have the right version of the CD4 atlas,or that it didn't download correctly. Can you try to download it again? Here is a direct link: https://figshare.com/ndownloader/files/31057081

As for the second part of the question: yes, you can disable the automatic filtering if you are sure you have a clean dataset (for example if you have performed a manual analysis to subset on the cell type of interest). Alternatively you can check out the automatic models that come with scGate:

models.DB <- scGate::get_scGateDB()
names(models.DB$human$generic)
[1] "Bcell"           "CD4T"            "CD8T"            "MoMacDC"         "Myeloid"         "NK"              "PanBcell"        "Plasma_cell"    
 [9] "Tcell"           "Tcell.alphabeta"

ProjecTILs-2.0 by default uses the Tcell filter, but if you had an atlas for a different cell type you should be able to specify a different filtering model to ProjecTILs::make.projection. But we are getting a bit off track from the original topic of the issue :)

SeBaBInf commented 2 years ago

hi Massimo for sure, no worries, let's stay on-topic, I was more thinking out loud :) . I appreciate you addressing that part thou.

I did download the new reference and it does work, thank you for linking it here. However, i have noticed that if I use fast.mode=T, I get

[1] "Using assay MAGIC_RNA for query"
Pre-filtering cells with scGate...
Error: Cannot add a different number of cells than already present

removing that argument solved the issue.

Also, comparing the results with the previous run with the older reference using fast.mode = T, filter.cells = F leads to quite different results:

                  Eomes_HI INFI_stimulated  Tcm Tcmp Tfh_Effector Tfh_Memory Th1_Effector Th1_Memory Treg
  Eomes_HI               3               0    0    0            1          0            0          0    0
  INFI_stimulated        0              66    1    2            0          3            0          0    0
  Tcm                    1               1  123   15            3         63            0          0    0
  Tcmp                   0               0    0    5            0          0            0          0    0
  Tfh_Effector           1               4    7    6           98         41            0          1    0
  Tfh_Memory             0               1    3    3            2        215            0          1    0
  Th1_Effector           4               9   12   15           76         57          105         63    0
  Th1_Memory             5               3  119   93            9         28            4        200    0
  Treg                   3             157   37   52         1058        282          421         47  218

to summarize, at the end would you opt for sticking with the reference you liked here, not use fast.mode and keep filter.cells? thank you!