Open cjfields opened 8 years ago
Original Redmine Comment Author Name: Carnë Draug Original Date: 2011-07-05T16:08:08Z
I didn’t use the code tags on the code and some parts of it was interpreted as text formatting. Here’s the right thing:
use Data::Dumper; use Bio::SeqIO; my $file = $ARGV[0]; my $seqio_object = Bio::SeqIO->new(-file => $file, -format => 'entrezgene'); my $seq_object = $seqio_object->next_seq; print Dumper($seq_object);
Original Redmine Comment Author Name: Carnë Draug Original Date: 2011-07-05T22:27:36Z
Discussing this on the mailing list http://thread.gmane.org/gmane.comp.lang.perl.bio.general/24705 it was shown that the entire data is accessible if the file is parsed using the Bio::ASN1::EntrezGene module directly (just not when using the Bio::SeqIO module).
The following code shows that the entire file is being parsed correctly:
use warnings;
use strict;
use Bio::ASN1::EntrezGene;
use Data::Dumper;
my $parser = Bio::ASN1::EntrezGene->new('file' => "entrezgene.asn");
while(my $result = $parser->next_seq){
print Dumper ($result);
}
I looked at the Bio::SeqIO module file, followed it to Bio::SeqIO::entrezgene and it looks like it should be using Bio::ASN1::EntrezGene module to parse the file anyway. Here’s initialize method that I found on the code
sub _initialize {
my ( $self, @args ) = @_;
$self->SUPER::_initialize(@args);
my %param = @args;
@param{ map { lc $_ } keys %param } = values %param; # lowercase keys
$self->{_debug} = $param{-debug} || 'off';
$self->{_locuslink} = $param{-locuslink} || 'no';
$self->{_service_record} = $param{-service_record} || 'no';
$self->{_parser} = Bio::ASN1::EntrezGene->new( file => $param{-file} );
#Instantiate the low level parser here (it is -file in Bioperl
#-should tell M.)
#$self->{_parser}->next_seq; #First empty record- bug in Bio::ASN::Parser
}
Author Name: Carnë Draug (Carnë Draug) Original Redmine Issue: 3261, https://redmine.open-bio.org/issues/3261 Original Date: 2011-07-05 Original Assignee: Bioperl Guts
Hi
Some data on entrezgene file are not parsed completely, and some of the data is inaccessible. I attached a file that causes that problem.
I was looking for the ids “NM_002105” and “NP_002096” which show up several times on the file. However, when I use Data::Dumper to see the contents of the sequence object I couldn’t see them.
use Data::Dumper; use Bio::SeqIO; my $file = $ARGV[0]; my $seqio_object = Bio::SeqIO->new(-file => $file, -format => ‘entrezgene’); my $seq_object = $seqio_object->next_seq; print Dumper($seq_object);
I can’t find 002105 or 002096 anywhere on the output (actually the first can be seen as part of an URL only)
I asked for help on this first on the bioperl mailing list and was told to report it as a bug.