gao-lab / GLUE

Graph-linked unified embedding for single-cell multi-omics data integration
MIT License
364 stars 55 forks source link

Query mapping #73

Open ccruizm opened 1 year ago

ccruizm commented 1 year ago

Hello!

I would like to know whether GLUE has an implementation to perform query mapping of unlabeled data. I want to create a reference map with GLUE and be able later on to either map new data onto it or re-integrate the reference with the new query.

Thanks in advance!

Jeff1995 commented 1 year ago

Hi @ccruizm. Thanks for your interest in GLUE!

As GLUE is a multi-omics integration method, I suppose you would be mapping a query dataset that is in one modality onto a reference data that is in a different modality? It's indeed an interesting use case, but unfortunately there is no such implementation in GLUE right now. You would have to integrate all datasets in one step, with no distinction between reference and query.

The most obvious solution to this use case would be to fix the pretrained autoencoder of the reference modality and train only the autoencoder of the query modality. That shouldn't be too difficult to implement though. We'll see if we can add this feature in the future. I'll let you know if that becomes available. Of course pull requests are also welcome : )

ccruizm commented 1 year ago

Hello @Jeff1995, that would make GLUE even more powerful than it already is. I was thinking of not necessarily mapping other modalities but reference mapping the same modality and performing label transfer (RNA -> RNA or ATAC -> ATAC). Since I will build a multimodal reference (RNA+ATAC) wanted to know about the possibilities to map new data when it is generated instead of re-training the whole reference from scratch with the new data.

Thanks for developing this great tool!

Jeff1995 commented 1 year ago

Thanks for the clarification! That should be easier to do. We will be testing both kind of mappings then : )

kanyulongkkk commented 4 months ago

Dear Dr Cao, when I want to perform cell type label tranfer from RNA to ATAC, how can I configure dataset in your code such as: scglue.models.configure_dataset( rna, "NB", use_highly_variable=True, use_layer="counts", use_rep="X_pca" )scglue.models.configure_dataset( atac, "NB", use_highly_variable=True, use_rep="X_lsi" )

kanyulongkkk commented 4 months ago

@Jeff1995

Jeff1995 commented 4 months ago

Hi @kanyulongkkk, thanks for your interest in GLUE! The current dataset configuration should work fine. You would need to use the transfer_labels function to perform the cell type transfer after model training.

kanyulongkkk commented 4 months ago

Thanks Dr.Cao ,but after transfer_labels function, I get the label like 0.315746. my ref label is integer, so I need also integer label to query,how can I did next,please help me

kanyulongkkk commented 4 months ago

@Jeff1995

Jeff1995 commented 4 months ago

Oh I see. Is your integer label something like a cluster index? In that case you could try converting it to string or category type first, and the predicted labels should remain "integers".

kanyulongkkk commented 4 months ago

yes, Dr.Cao , my data is cluster index such as "0,1,2,3,4", I converting it to string or category type and try later