Closed gouinK closed 4 years ago
@gouinK your code as provided is incorrect. the variable ds is not defined. In order to know the valid data set, use
bm.get_datasets(MART)
where MART is a valid MART e.g. ENSEMBL_MART_ENSEMBL.I hope this make sense.
I've been using this esrvice today from bioservices 1.7.8 and it worked perfectly well. The logic behind is not always straightforward but you know it from your previous experiences. In case it can help you also have this page from the documentation that may be useful: https://bioservices.readthedocs.io/en/master/biomart.html
Hi there, the first line of my code snippet above is this: ds = 'hsapiens_gene_ensembl'
Is this incorrect?
@gouinK
Your code is correct, in particular the dataset. However, there is an issue with the filter itself.
There is no way for bioservices to check the expected type of the filter unfortunately as far as I know.
Here, you set
queries = ['PDCD1']
Instead of using add_filters_to_xml, which is a feature of biomart, I would recommend to to it a posterior using e.g. Pandas:
from bioservices import BioMart
b = BioMart()
ds = 'hsapiens_gene_ensembl'
b.new_query()
b.add_dataset_to_xml("hsapiens_gene_ensembl")
b.add_attribute_to_xml("hgnc_symbol", dataset=ds)
b.add_attribute_to_xml("ensembl_gene_id", dataset=ds)
b.add_filter_to_xml('hgnc_symbol', 'PDCD1', dataset=ds)
xml = b.get_xml()
res = b.query(xml)
Then to check the content in a nice Pandas DataFrame
import pandas as pd
import io
df = pd.read_csv(io.StringIO(res), sep='\t', header=None)
df.columns = ['hgnc', 'ensembl_id']
and you get::
PDCD1 ENSG00000188389
PDCD1 ENSG00000276977
If you want to add another gene, use commas:
b.add_filter_to_xml('hgnc_symbol', 'PDCD1,MT-TF', dataset=ds)
Hi all, does anyone have an idea as to why the following simple query results in a "provided dataset is not found" error? I have used biomart in R for a long time, but am now moving all of my scripts to python and would really like this aspect to work. Thanks!
ds = 'hsapiens_gene_ensembl' bm = Biomart(host="www.ensembl.org",verbose=True) bm.new_query() bm.add_dataset_to_xml(ds) bm.add_attribute_to_xml('hgnc_symbol',dataset=ds) queries = ['PDCD1'] bm.add_filter_to_xml('hgnc_symbol',queries,dataset=ds) xml = bm.get_xml() res = bm.query(xml)
version: bioservices bioconda/noarch::bioservices-1.7.8-pyh864c0ab_0