Closed hansen7 closed 11 months ago
I am using the following function to convert gene names into ensembl ids, but for the total 28024 genes, there are 1659 gene name have multiple ensembl ids, and 6731 have no matching ensembl ids.
from biothings_client import get_client
def convert_to_ensembl(gene_names):
mg = get_client('gene')
response = mg.querymany(gene_names, scopes='symbol', fields='ensembl.gene', species='human', returnall=True)
missing = response.get('missing', [])
duplicates = response.get('dup', [])
success = {}
for item in response['out']:
query = item.get('query', None)
ensembl_data = item.get('ensembl', None)
if ensembl_data:
if isinstance(ensembl_data, list): # Handle case where ensembl_data is a list
ensembl_genes = [d.get('gene', None) for d in ensembl_data if 'gene' in d]
else:
ensembl_genes = [ensembl_data.get('gene', None)]
if query:
success[query] = ensembl_genes
if missing:
print(f"Missing: {missing}")
if duplicates:
print(f"Duplicates: {duplicates}")
return success
gene_names = ['TP53', 'BRCA1', 'C1orf112', 'FAM214B', 'RTEL1-TNFRSF6B'] # Add your gene names
result = convert_to_ensembl(gene_names)
print(f"Successful conversions: {result}")
Hi @hansen7 , both gene names and Ensembl ids should be in the atlas object.. did you check? In adata.var
Hi @LisaSikkema, thanks, you are right! The ensembl_id
is in the index column
Great, glad you found it!
Hi, thanks for the contribution!
Do you know what would the most ideal way to convert the gene names into the ensembl ids?