Open HelloWorldLTY opened 7 months ago
Sorry for the inconvenience. Our method used Ensembl id as gene index. We provided an automatic method to map gene names to ensembl id based on mygene here.
Hi, thanks. After transferring the data with this method, I meet a new bug: In this function:
pipeline.fit(train_data, # An AnnData object
pipeline_config, # The config dictionary we created previously, optional
split_field = 'split', # Specify a column in .obs that contains split information
train_split = 'train',
valid_split = 'valid',
batch_gene_list = batch_gene_list, # Specify genes that are measured in each batch, see previous section for more details
device = DEVICE,
)
43 g2id = dict(zip(self.gene_list, list(range(len(self.gene_list)))))
44 for batch in batch_gene_list:
---> 45 idx = torch.LongTensor([g2id[g] for g in batch_gene_list[batch]])
46 self.batch_gene_mask[batch] = torch.zeros(len(g2id)).bool()
47 self.batch_gene_mask[batch][idx] = True
KeyError: '0'
I think the reason is after transferring the gene name, there are some strange gene:
'ENSG00000137547',
'ENSG00000120992',
'ENSG00000187735',
'ENSG00000047249',
'ENSG00000023287',
'0',
'ENSG00000168300',
'0-1',
Generally it is the same issue as here. Did you follow the tutorial? The tutorial should have automatically removed gene ids that are not in pretrained list.
Yes, I followed the tutorial but used my own datasets. The dataset I used is from tangram: https://github.com/broadinstitute/Tangram/blob/master/tutorial_tangram_with_squidpy.ipynb
I will try to remove all the genes with 0 or 0-id and then have a try🤔
Hello, I have updated the codes so that now it should work more smoothly. If you installed CellPLM with pip
previously, please try pip install -U cellplm
to update it accordingly. Thanks!
Hi, I tried to impute my own spatial datasets (as mouse) with the tutorial for imputation. However, it seems that I cannot impute it with a bug:
I check that my dataset is in gene name (here the genes name are all upper-case since I tried to use orthology genes.).