bioperl / bioperl-live-redmine

Legacy tickets migrated from the OBF Redmine issue tracker: http://redmine.open-bio.org
0 stars 0 forks source link

incomplete parse of entrezgene file #113

Open cjfields opened 8 years ago

cjfields commented 8 years ago

Author Name: Carnë Draug (Carnë Draug) Original Redmine Issue: 3261, https://redmine.open-bio.org/issues/3261 Original Date: 2011-07-05 Original Assignee: Bioperl Guts


Hi

Some data on entrezgene file are not parsed completely, and some of the data is inaccessible. I attached a file that causes that problem.

I was looking for the ids “NM_002105” and “NP_002096” which show up several times on the file. However, when I use Data::Dumper to see the contents of the sequence object I couldn’t see them.

use Data::Dumper; use Bio::SeqIO; my $file = $ARGV[0]; my $seqio_object = Bio::SeqIO->new(-file => $file, -format => ‘entrezgene’); my $seq_object = $seqio_object->next_seq; print Dumper($seq_object);

I can’t find 002105 or 002096 anywhere on the output (actually the first can be seen as part of an URL only)

I asked for help on this first on the bioperl mailing list and was told to report it as a bug.

cjfields commented 8 years ago

Original Redmine Comment Author Name: Carnë Draug Original Date: 2011-07-05T16:08:08Z


I didn’t use the code tags on the code and some parts of it was interpreted as text formatting. Here’s the right thing:

use Data::Dumper; use Bio::SeqIO; my $file = $ARGV[0]; my $seqio_object = Bio::SeqIO->new(-file => $file, -format => 'entrezgene'); my $seq_object = $seqio_object->next_seq; print Dumper($seq_object);

cjfields commented 8 years ago

Original Redmine Comment Author Name: Carnë Draug Original Date: 2011-07-05T22:27:36Z


Discussing this on the mailing list http://thread.gmane.org/gmane.comp.lang.perl.bio.general/24705 it was shown that the entire data is accessible if the file is parsed using the Bio::ASN1::EntrezGene module directly (just not when using the Bio::SeqIO module).

The following code shows that the entire file is being parsed correctly:

use warnings;
use strict;
use Bio::ASN1::EntrezGene;
use Data::Dumper;

my $parser = Bio::ASN1::EntrezGene->new('file' => "entrezgene.asn");
while(my $result = $parser->next_seq){
  print Dumper ($result);
}

I looked at the Bio::SeqIO module file, followed it to Bio::SeqIO::entrezgene and it looks like it should be using Bio::ASN1::EntrezGene module to parse the file anyway. Here’s initialize method that I found on the code

sub _initialize {
    my ( $self, @args ) = @_;
    $self->SUPER::_initialize(@args);
    my %param = @args;
    @param{ map { lc $_ } keys %param } = values %param;    # lowercase keys
    $self->{_debug}          = $param{-debug}          || 'off';
    $self->{_locuslink}      = $param{-locuslink}      || 'no';
    $self->{_service_record} = $param{-service_record} || 'no';
    $self->{_parser} = Bio::ASN1::EntrezGene->new( file => $param{-file} );

    #Instantiate the low level parser here (it is -file in Bioperl
    #-should tell M.)
    #$self->{_parser}->next_seq; #First empty record- bug in Bio::ASN::Parser
}