bioperl / bioperl-live-redmine

Legacy tickets migrated from the OBF Redmine issue tracker: http://redmine.open-bio.org
0 stars 0 forks source link

bp_load_ontology ISBN title parsing error in OBO format #52

Open cjfields opened 9 years ago

cjfields commented 9 years ago

Author Name: Leighton Pritchard (Leighton Pritchard) Original Redmine Issue: 2730, https://redmine.open-bio.org/issues/2730 Original Date: 2009-01-12 Original Assignee: Bioperl Guts


When attempting to load the OBO-formatted Gene Ontology into a BioSQL database with:

load_ontology.pl —host localhost —dbname —dbpass —dbuser —namespace “Gene Ontology” —format obo —lookup —noobsolete gene_ontology_edit.obo.txt —computetc

using the SVN scripts for bioperl-db, an error is thrown:

——————————- WARNING ——————————- MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were (“Biosynthesis and Function”“,”“,”0“,”") FKs () Column ‘accession’ cannot be null —————————————————————————- Could not store term GO:0002129, name ‘wobble position guanine ribose methylation’:

——————- EXCEPTION: Bio::Root::Exception ——————- MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK: Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/TermAdaptor.pm:293 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:271 STACK: main::persist_term ./load_ontology.pl:812 STACK: ./load_ontology.pl:617 —————————————————————————————-

at ./load_ontology.pl line 824 main::persist_term(‘-term’, ‘Bio::Ontology::OBOterm=HASH(0x1d4b100)’, ‘-db’, ‘Bio::DB::BioSQL::DBAdaptor=HASH(0xcd8270)’, ‘-termfactory’, ‘undef’, ‘-throw’, ‘CODE (0x5419e0)’, ‘-mergeobs’, …) called at ./load_ontology.pl line 617

At first glance, this seemed to be the result of the script incorrectly parsing the GO term, which looks like this:

[Term] id: GO:0002129 name: wobble position guanine ribose methylation namespace: biological_process def: “The process whereby the ribose of guanosine at position 34 in the anticodon of a tRNA is post-transcriptionally methylated at the 2’-O position.” [GOC:hjd, ISBN:155581073X “tRNA: Structure, Biosynthesis and Function”] is_a: GO:0002130 ! wobble position ribose methylation

The DBXREFs, which are comma-separated, include an ISBN reference with accompanying book title; the book title contains a comma. It looked likely that the ontology loader was misparsing this comma as a separator indicating the start of a new DBXREF.

Removing the comma from the book title and re-running the script throws the error:

——————————- WARNING ——————————- MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were (“Biosynthesis and Function”“,”“,”0“,”") FKs () Column ‘accession’ cannot be null —————————————————————————- Could not store term GO:0002132, name ‘wobble position uridine ribose methylation’:

——————- EXCEPTION: Bio::Root::Exception ——————- MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK: Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/TermAdaptor.pm:293 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:271 STACK: main::persist_term /usr/local/bin/bp_load_ontology.pl:805 STACK: /usr/local/bin/bp_load_ontology.pl:610 —————————————————————————————-

at /usr/local/bin/bp_load_ontology.pl line 817 main::persist_term(‘-term’, ‘Bio::Ontology::OBOterm=HASH(0x1d4c070)’, ‘-db’, ‘Bio::DB::BioSQL::DBAdaptor=HASH(0xcd58e0)’, ‘-termfactory’, ‘undef’, ‘-throw’, ‘CODE (0x541be0)’, ‘-mergeobs’, …) called at /usr/local/bin/bp_load_ontology.pl line 610

which also has a similar comma in the ISBN title. Removing all these title commas allows the script to progess beyond this point more-or-less as normal.

cjfields commented 9 years ago

Original Redmine Comment Author Name: Chris Fields Original Date: 2009-01-15T16:58:51Z


Pushing to 1.6.x.