aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

Issue with accessing Ensembl database via Pybiomart in tutorial #345

Closed HarryShen668 closed 2 months ago

HarryShen668 commented 2 months ago

I am currently working through your tutorial for scenicplus and encountered an issue related to accessing the Ensembl database using Pybiomart. During the Quality control step for scATAC-seq preprocessing, when attempting to retrieve human gene annotation data from the Ensembl database,

import pybiomart as pbm
dataset = pbm.Dataset(name='hsapiens_gene_ensembl',  host='http://www.ensembl.org')
annot = dataset.query(attributes=['chromosome_name', 'transcription_start_site', 'strand', 'external_gene_name', 'transcript_biotype'])

I encountered errors.

Traceback (most recent call last):
  File ..., line 3
    annot = dataset.query(attributes=['chromosome_name', 'transcription_start_site', 'strand', 'external_gene_name', 'transcript_biotype'])
  ...
ParseError: mismatched tag: line 62, column 2

I guess this issue might be due to the current status of Ensembl Biomart. I wanted to reach out to inquire if there are any known workarounds or alternative methods that could be utilized temporarily until the Biomart service is restored.

I understand that this issue might be outside of your direct control, but any guidance or assistance you could provide to help resolve this issue would be greatly appreciated.