LieberInstitute / SPEAQeasy

SPEAQeasy: portable LIBD RNA-seq pipeline using Nextflow. Check http://research.libd.org/SPEAQeasy-example/ for an example on how to use this pipeline and analyze the resulting output files.
http://lieberinstitute.github.io/SPEAQeasy
MIT License
6 stars 4 forks source link

biomaRt failure #62

Closed gpertea closed 4 years ago

gpertea commented 4 years ago

Getting this error lately:

Error: the biomaRt query to obtain gene symbols failed. BiomaRt servers are likely down.

The servers are not down, they are likely no longer accessible through plain http (or they changed something in the protocol again). See https://support.bioconductor.org/p/134524/ -- it affects older R versions as well, including our conda_R/3.6.x

Of course using --no_biomart circumvents this issue but the default is otherwise and the error message is misleading.

I would suggest removing this dependency on an external service as it seems to be needed just for the retrieval of gene ID/symbols and consider storing these gene data locally as part of the (cached) annotation data. Perhaps these can be optionally provided by users as a csv/tsv associating gene IDs found in the annotation GTF to additional gene symbols/IDs if they want those additional gene IDs/symbols added as additional columns in the RSE rowData.

In the case of GENCODE annotation the gene_name attribute (found in the Gencode v32 annotation GTF) has the gene symbol already - I suspect in most cases the gene symbol/name can/should be found directly in the annotation file. The Entrez ID is indeed missing, though I am not sure if that is ever needed by the users of the pipeline.