Bio::DB::EntrezGene no longer returning gene names via get_Stream_by_id - Githubissues

cjfields commented 8 years ago

Author Name: Matthew LaFave (Matthew LaFave) Original Redmine Issue: 3431, https://redmine.open-bio.org/issues/3431 Original Date: 2013-04-29 Original Assignee: Bioperl Guts

I’ve been using the BioPerl module Bio::DB::EntrezGene to retrieve gene names based on Entrez gene IDs. It has worked fine for several months (and at least as recently as April 5th), but it hasn’t worked for the last few days. From what I can tell, nothing has changed about the module, so my impression is that NCBI may have changed the formatting of their records, and the module may need to be updated.

For example, here’s the sample code from the documentation for the module; it should work every time:

#!/usr/bin/perl

use strict;
use warnings;
use Bio::DB::EntrezGene;

my $db = Bio::DB::EntrezGene->new;

my $seqio = $db->get_Stream_by_id([2, 4693, 3064]); # Gene ids
    while ( my $seq = $seqio->next_seq ) {
    print "id is ", $seq->display_id, "\n";
}

exit;

…but recently, instead of returning a brief list of genes, it returns this:

Replacement list is longer than search list at /Library/Perl/5.12/Bio/Range.pm line 251.
UNIVERSAL->import is deprecated and will be removed in a future perl at /Library/Perl/5.12/Bio/Tree/TreeFunctionsI.pm line 94
Data Error: none conforming data found on line 1 in /var/folders/2f/55z0d46n3l10bq650j6svgw89rmqw1/T/mkguvw1MOO/VR86iPUDSJ!
first 20 (or till end of input) characters including the non-conforming data:
::= {
 {
 track-
 at /Library/Perl/5.12/Bio/SeqIO/entrezgene.pm line 171

An individual in the UK was able to reproduce the issue, so it’s unlikely that it’s something about my situation that’s causing this. His assessment on Stackoverflow (http://stackoverflow.com/questions/16199037/bioperl-module-biodbentrezgene-no-longer-working) was the following:

“Looking further it looks like the problem is that the data starts with Entrezgene-Set ::=and includes three items. BioPerl is expecting only Entrezgene ::=, and will not cope with sets. I guess BioPerl won’t handle this aspect of Entrez Gene data. If you look at the Bio::ASN1::EntrezGene module, the next_seq() subroutine insists onEntrezgene ::= at the start of the data. BioPerl won’t handle Entrez Gene sets.”

I’ve contacted NCBI to see if anything had changed, but I haven’t heard back yet. If you need any additional information, please let me know. Thanks!

cjfields commented 7 years ago

Pinging @mlafave to see if this was his original bug report

cjfields commented 7 years ago

This may be a bug in Bio::ASN1::EntrezGene, but we can make and release an update as the repo is now on github as part of the bioperl github org.

bioperl / bioperl-live-redmine

Bio::DB::EntrezGene no longer returning gene names via get_Stream_by_id #150