bioperl / bioperl-db

BioPerl BioSQL ORM
http://bioperl.org
Other
10 stars 12 forks source link

crash on the attempt to store same sequence in a diff. namespace #5

Open cjfields opened 9 years ago

cjfields commented 9 years ago

Author Name: Dmitry Samborskiy (Dmitry Samborskiy) Original Redmine Issue: 2280, https://redmine.open-bio.org/issues/2280 Original Date: 2007-04-27 Original Assignee: Bioperl Guts


Hi All,

I’ve found that ‘Duplicate entry’ crash occurs if I store the same sequence the second time (but in a different namespace).

The attached archive contains complete and reproducable (I believe) example for this issue.

I use stable bioperl-1.5.2/bioperl-db-1.5.2 releases against mysql-4.1.16 database server.

Thanks in advance, Dmitry Samborskiy

P.S. I got following output:

—————————— WARNING ——————————- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were (" “,”Direct Submission“,”Submitted (10-JUL-2004) National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA“,”CRC-7AF85E0508A630AE“,”1“,”3429“,”" ) FKs () Duplicate entry ‘CRC-7AF85E0508A630AE’ for key 3 —————————————————————————- Could not store NC_005982: ——————- EXCEPTION ——————- MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /tmp/perl/lib/perl5/site_p erl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /tmp/perl/lib/perl5/site_pe rl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /tmp/perl/lib/perl5/site_perl /5.8.6/Bio/DB/Persistent/PersistentObject.pm:271 STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children /tmp/perl/lib /perl5/site_perl/5.8.6/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:217 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /tmp/perl/lib/perl5/site_p erl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /tmp/perl/lib/perl5/site_pe rl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /tmp/perl/lib/perl5/site_perl /5.8.6/Bio/DB/Persistent/PersistentObject.pm:271 STACK Bio::DB::BioSQL::SeqAdaptor::store_children /tmp/perl/lib/perl5/site_perl/ 5.8.6/Bio/DB/BioSQL/SeqAdaptor.pm:224 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /tmp/perl/lib/perl5/site_p erl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /tmp/perl/lib/perl5/site_pe rl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251 STACK Bio::DB::Persistent::PersistentObject::store /tmp/perl/lib/perl5/site_perl /5.8.6/Bio/DB/Persistent/PersistentObject.pm:271 STACK (eval) ./load_seqdatabase.pl:620 STACK toplevel ./load_seqdatabase.pl:602


at ./load_seqdatabase.pl line 633

cjfields commented 9 years ago

Original Redmine Comment Author Name: Dmitry Samborskiy Original Date: 2007-04-27T18:10:09Z


Created an attachment (id=639) Test example

cjfields commented 9 years ago

Original Redmine Comment Author Name: Chris Fields Original Date: 2008-03-05T17:13:33Z


I’m not sure how you are using load_seqdatabase.pl here; I think the script by default assumes you are loading new sequences in the database unless you specify options like ‘remove’, ‘update’, ‘safe’, etc., otherwise it dies if dups are possibly being inserted into the database (‘safe’ just bypasses the errors, and I believe ‘remove’ and ‘update’ do what they suggest).

The test script you attached also tries to switch the namespace directly by getting the persistent obj from the database, assign it a new namespace, and then store it. The problem with this approach is you are attempting to store the object using the same assigned primary_key (so it would indeed move it, as you’re updating the current obj, not a create()). Notably, using create() with a pers. object with an assigned primary_key() gets you an error (and a hint):

——————- EXCEPTION: Bio::Root::Exception ——————- MSG: must not change primary_key() once it is set STACK: Error::throw STACK: Bio::Root::Root::throw /Users/cjfields/bioperl/bioperl-live/Bio/Root/Root.pm:357 STACK: Bio::DB::Persistent::PersistentObject::primary_key /Users/cjfields/bioperl/db/Bio/DB/Persistent/PersistentObject.pm:321 STACK: Bio::DB::Persistent::Seq::primary_key /Users/cjfields/bioperl/db/Bio/DB/Persistent/Seq.pm:124 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cjfields/bioperl/db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:211 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cjfields/bioperl/db/Bio/DB/Persistent/PersistentObject.pm:244 STACK: test.pl:33 —————————————————————————————-

The way I have worked out to do this is to reset the seq object’s primary_key() by assigning it undef prior to using store() or create() (which assigns a new primary key for the object, even if it is in the same namespace):

  1. store the found sequence in the second biodatabase: my $pseq = $seqadp->create_persistent($seq); $pseq->namespace($ns2); $pseq->primary_key(undef); $pseq->store(); # assign new primary key $seqadp->commit;

This works as long as the sequence namespace doesn’t match an already present one.

It might be worth adding some tests to make sure remove()-ing one persistent sequence doesn’t cause problems with the other sequences in different namespaces. I would also like Hilmar to comment on this as well to see if this is an adequate solution or if there are potential problems.

(In reply to comment #0)

Hi All,

I’ve found that ‘Duplicate entry’ crash occurs if I store the same sequence the second time (but in a different namespace).

The attached archive contains complete and reproducable (I believe) example for this issue.

cjfields commented 9 years ago

Original Redmine Comment Author Name: Hilmar Lapp Original Date: 2008-03-09T19:26:20Z


(In reply to comment #2)

I’m not sure how you are using load_seqdatabase.pl here; I think the script by default assumes you are loading new sequences in the database unless you specify options like ‘remove’, ‘update’, ‘safe’, etc.,

Actually, one must specify —lookup to have incoming sequences looked up against the database first. All the other switches (except —remove, which works by itself) specify what to do if the sequence is indeed found already.

Since namespace (if set) is part of the unique key of a sequence, loading the same file (or sequence) under a different namespace should indeed create a duplicate of it. The error that Dmitry reports also isn’t an error from violating the unique key on bioentry or biosequence, so it is a rather odd one and surely indicative of a bug - the supposed behavior is to find the reference from the previous insert (since it will have the same unique key; bioentry doesn’t have a part in the unique key of a reference, only in the association between a reference and a bioentry)

However, if I recall correctly there was a bugfix in the ReferenceAdaptor’s implementation of its unique key search, so this might actually be fixed meanwhile. To check, Dmitry’s test case would have to be run with the svn HEAD. I probably won’t get to this right away, but if anyone has a chance, it’d be helpful to get confirmation from someone being set up to rerun the test against HEAD.

-hilmar

cjfields commented 9 years ago

Original Redmine Comment Author Name: Chris Fields Original Date: 2008-11-29T15:37:57Z


Pushing to 1.6 bioperl-db point release.