BioX-NKU / scButterfly

A versatile single-cell cross-modality translation method via dual-aligned variational autoencoders
MIT License
14 stars 1 forks source link

BMMC data for RNA and ATAC translation #4

Open zhangxueting233 opened 2 weeks ago

zhangxueting233 commented 2 weeks ago

Thank you very much for sharing. I found that some data was not sorted during the reproduction process. How do you handle the sorting of these data such as BMMC data how to sort?Can you provide a tutorial on sorting peaks

caosip commented 2 weeks ago

Thanks for your suggestion! We have already provided the codes for sorting peaks in BioX-NKU/scButterfly_source/experiment/RNA_ATAC/cross_validation_by_cell/run_model.py.

zhangxueting233 commented 2 weeks ago

Thank you very much for your reply. While reproducing your tutorial using scbutterfly-C, I encountered UserWarning: Make sure the registered product expression in andata contains normalized count data. I have changed several datasets and the issue persists. Have you ever encountered this problem?I hope to receive your reply. Thank you very much

caosip commented 2 weeks ago

I'm sorry I have no idea about this warning. Could you provide more information about the warning, such as a screenshot? Would scButterfly-B/T produce a similar warning? It may be related to the use of MultiVI in scButterfly-C.

zhangxueting233 commented 2 weeks ago

I found it to be a MultiVi issue, the other two models do not have this problem

zhangxueting233 commented 2 weeks ago

adata_gene.X=adata_gene.layers['counts'] adata_protein.X=adata_protein.layers['counts'] I am not sure if the data you are using is normalized or not. If the data X is normalized, I suggest adding this sentence to ensure successful operation

caosip commented 2 weeks ago

Thanks for your suggestion. The data we used for MultiVI is all the raw counts data but with the genes/peaks filter consistent with scButterfly. However, MultiVI seems to have a rather strict detection for the data distribution. So feel free to ignore this warning, MultiVI will give a promising result.

zhangxueting233 commented 2 weeks ago

Yes, I found that using data X directly is indeed normalized and cannot run. I can use raw data, where layers ['counts'] are the raw data. I hope this suggestion will be more user-friendly for reproduction.

caosip commented 2 weeks ago

Thank you very much! I will soon add this tip to the readme part.