Open cjfields opened 9 years ago
Original Redmine Comment Author Name: Erikjan empty Original Date: 2008-03-23T20:22:08Z
Created an attachment (id=882) MSGs from load_seqdatabase.pl / swissprot
Original Redmine Comment Author Name: Bank Beszteri Original Date: 2008-04-08T03:26:19Z
Created an attachment (id=898) Another output from load_seqdatabase.pl illustrating taxonomic conflicts between Swissprot flat file (v.13.1) & NCBI taxonomy
Original Redmine Comment Author Name: Bank Beszteri Original Date: 2008-04-08T03:32:32Z
(From update of attachment 898) Forgot to add: MySQL this time (client v.4.0.18, server v.5.0.45)
Original Redmine Comment Author Name: Chris Fields Original Date: 2008-11-29T15:43:34Z
Pushing to 1.6 bioperl-db point release.
Author Name: Erikjan empty (Erikjan empty) Original Redmine Issue: 2474, https://redmine.open-bio.org/issues/2474 Original Date: 2008-03-23 Original Assignee: Bioperl Guts
Latest bioperl-live, bioperl-db, biosql schema.
Using: PostgreSQL 8.3.1 DBD::Pg 2.3.0 perl 5.8.8
Loading uniprot_sprot.dat with load_seqdatabase.pl
Some entries are rejected. The errors and warnings are:
10 instances of error: value too long for type character varying(40) (all BioCyc id’s that are bit longer than 40 chars) if we change the varchar(40) to varchar(128) ( or something ), these entries should be alright.
——————————- WARNING ——————————- MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were (“BioCyc”,“EcoCyc:ASP-SEMIALDEHYDE-DEHYDROGENASE-MON”,“0”,“”) FKs () ERROR: value too long for type character varying(40) —————————————————————————- Could not store P0A9Q9: ——————- EXCEPTION: Bio::Root::Exception ——————- MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:217 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:224 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK: scripts/biosql/load_seqdatabase.pl:630 —————————————————————————————-
150 instances of error: “Could not store”. (The offending sprot id’s are enumerated below the error stack.)
Could not store P0C6J8: ——————- EXCEPTION: Bio::Root::Exception ——————- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK: Error::throw STACK: Bio::Root::Root::throw /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK: Bio::DB::Persistent::PersistentObject::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271 STACK: scripts/biosql/load_seqdatabase.pl:630 —————————————————————————————-
(this list includes id’s with some other errors, for instance the above BioCyc errors. It’s a bit hard to separate those out) P0C6J8 P0C695 P0C696 P0C697 P0C698 Q9QAB9 Q67924 Q9QBF2 P0C677 Q9E6S6 Q81102 P0C6H7 Q81164 P0C6H8 Q9QMI2 P0C680 P0C6I2 P0C6I3 Q69608 P0C683 P0C682 P0C6I8 P0C6I9 P0C6I6 O71303 Q37472 P54971 P85028 P0A9Q9 P0C691 Q91C36 O91533 Q4R1S7 Q4R1R9 Q9QAB8 Q9PX62 Q67925 Q9QBF1 P0C676 Q9E6S5 P0C688 Q913A7 Q81165 P0C690 Q9QMI1 P0C679 Q67878 O56655 Q69602 Q80IU7 Q9QAW8 Q80IU4 Q8JMY4 Q99HS4 Q99HR5 Q69605 Q8QZQ2 Q9IBI4 P87744 Q9YPV8 Q8JMY7 Q8JN08 Q8JMZ7 Q9J5S2 O71304 P03398 P03324 P0C6J9 Q91C37 O91532 Q4R1S8 Q4R1S0 P0C6G8 P0C6H0 P0C6H1 P0C6K6 P0C6H2 P0C6H3 Q913A8 P0C6K5 P0C6I0 P0C6I1 Q67876 O92920 P0C6I5 P89951 Q9WJE9 Q8JMZ4 P0C6J0 P0C684 Q91C35 O91534 Q4R1S6 Q4R1R8 Q9QAB7 Q9PWW3 Q67926 Q9QBF0 Q8JXB9 Q9E6S4 P31868 Q913A6 Q81162 Q998L9 Q9QMI0 Q998M2 Q67875 O92921 Q69603 Q80IU6 Q80IU3 Q99HS3 Q99HR4 Q69606 Q9IBI3 P87745 Q9WKC4 Q8JMY6 Q8JN07 Q8JMZ6 Q77NU1 O71305 P21645 P01546 P09348 P0AF06 P0A749 Q000A9 P02147 P31057 P26647 Q65399 P00529 Q9I9M4 P0AGK1 P0A887 P75728 Q91C38 O91531 Q4R1S9 Q4R1S1 P0C685 Q9PXA2 Q67923 Q9PX75 P0C678 Q9E6S8 P0C686 Q913A9 Q81163 P0C687 Q9QMI3 P0C681 Q67877 O93195 Q69604 Q80IU8 Q9QAX0 Q80IU5 Q8JMY3 Q99HR6 Q69607 Q9IBI5 P87743 Q9YJT2 Q8JMY5 Q8JN06 Q8JMZ5 Q9J5S3 O71302
Then there around 1000 WARNINGs about taxonomy (I think), of the form: ——————————- WARNING ——————————- MSG: The supplied lineage does not start near ‘Epstein-Barr virus’ (I was supplied ‘Human herpesvirus 4 | Lymphocryptovirus | Gammaherpesvirinae | Herpesviridae’) —————————————————————————- I will attach a file with the output of grep “^MSG:” ~/load_seqdatabase.pl.swissprot.output.txt | sort | uniq -c (Unfortunately the sprot id is not mentioned)
see also:
http://article.gmane.org/gmane.comp.lang.perl.bio.general/16844
http://bugzilla.open-bio.org/show\_bug.cgi?id=2389