EperLuo / scDiffusion

A model developed for the generation of scRNA-seq data
MIT License
20 stars 3 forks source link

Muris datasets discrepancy? #4

Open kiristern opened 2 months ago

kiristern commented 2 months ago

Hello, I probably am missing something, but when trying to re-run your code as is, using the provided h5ad datasets and pre-trained weights, the muris anndata object seems to be missing the cell type information (it only contains organ column--mislabelled as celltype)...? I suppose cell type annotations were generated following Scimilarity tutorial? However, I was wondering if there is a script with the seed/params used to obtain the exact same (ood) muris anndata object, with additional cell type annotations? Thanks!

EperLuo commented 2 months ago

Hi! Sorry for the discrepancy, I just find out that I didn't upload the dataset that used in the ood experiment. The muris dataset in the figshare is used for the unconditional/conditional generation. I have updated the figshare collection, the muris_mam_spl_T_B.h5ad should contain the organ and cell type information.

kiristern commented 2 months ago

Hi again, Sorry for re-opening the issue, I just had the chance to look at the newly updated muris_mam_spl_T_B.h5ad file you uploaded. I was just wondering, however, if you happen to have saved a muris.h5ad with all organ and all cell types (not just for the 2 organs and 2 celltypes)? Or would it be possible to please share your script on how you obtained the cell type annotations? Thanks again!

EperLuo commented 2 months ago

Ok I got what you mean. I didn't save the data with all organ and all cell type, but I can share the process script with you.

files = os.listdir('/data1/lep/Workspace/guided-diffusion/data/tabula_muris/droplet')
adata_list = []
celltype_muris = []
for file in files:
    if file.startswith('Trachea'):
        continue 
    adata_tmp = sc.read_10x_mtx(
    '/data1/lep/Workspace/guided-diffusion/data/tabula_muris/droplet/'+file,  # the directory with the `.mtx` file
    var_names='gene_symbols',                # use gene symbols for the variable names (variables-axis index)
    cache=True) 
    if file.startswith('Lung-10X_P8'):
        sc.pp.filter_cells(adata_tmp, min_genes=200)
    adata_list.append(adata_tmp)
    celltype_muris += [file.split('-')[0]]*adata_tmp.X.shape[0]
    # print(file,adata_tmp)
adata_tra = sc.read_h5ad('../data/tabula_muris/trachea.h5ad')
adata_list.append(adata_tra)
celltype_muris += ['Trachea']*adata_tra.X.shape[0]
adata = ad.concat(adata_list)
adata.obs['celltype'] = celltype_muris

I remember there are some problem with the original Trachea and Lung data. And the trachea.h5ad here was obtained earlier by another similar script (sorry I've lost the script, but it was basically the same as what I did to the Lung data above)

Sbs12 commented 3 days ago

Hello, can your unconditional generation effect achieve the effect described in the article?