azgs / azlibrary_database

1 stars 1 forks source link

Failed collections not being rolled back/? #16

Closed aazaff closed 5 years ago

aazaff commented 5 years ago

I see in uploads that collection_id 1617 (upload_id=1618) failed, but still has an entry in the collections table. This is definitely not the desired behaviour - is this a bug in your rollback code or has there been a design miscommunication?

Sorry if the latter, but this is not how I want it to work at all. Possibly related to #15.

NoisyFlowers commented 5 years ago

This is how its worked from the start. Data content is removed on rollback, but not the collections record. The idea was that, if an import failed, that's still a collection that someone wants in the db. So we keep its record and id on hand and treat further upload attempts as updates to that collection. I carried this idea on into the latest iteration by using the assigned perm_id in the directory name of the collection in the failures directory.

Perhaps the thing to do is move the "removed" field from uploads to collections and always set it on rollback and unset it on upload. I can imagine other scenarios, as well as this one, in which we might want to pull a collection offline, temporarily or permanently, but not delete it altogether.

aazaff commented 5 years ago

"Perhaps the thing to do is move the "removed" field from uploads to collections and always set it on rollback and unset it on upload. I can imagine other scenarios, as well as this one, in which we might want to pull a collection offline, temporarily or permanently, but not delete it altogether."

Okay! This makes a lot of sense! Do it.

NoisyFlowers commented 5 years ago

Working with this approach, I am realizing there is another question we should address. Currently, when an update to an existing collection fails, rollback cleans that collection from the db; then the source dir for the update is moved to failures. There is still a tarball for the previous version of this collection in archive. Should we: 1) delete that tarball, or 2) repopulate the db from that tarball, thus putting the db back the way it was before we tried to update the collection? It's not clear to me, without further code inspection, how difficult (2) will be to implement.

aazaff commented 5 years ago

Ack! I can’t believe I never considered this problem – serious design omission ☹ on my part.

  1. We absolutely need some version of option 2 – restoring the collection back to its previous state in the db.

Perhaps when you run an add script you could create a “temp” table that holds all of the previous collection information in it, and restore from that if necessary if the add failed?

NoisyFlowers commented 5 years ago

Here's a rundown of the lastest changes to address failures:

All attempted imports will have a record in the uploads table. Thus, they will be assigned an upload ID. This record and its ID will exist whether the import succeeds or not.

Except for the creation of the upload record, all of the db work for an import now takes place in a transaction. This insures that any problems encountered during import result in the db being rolled back to its previous state. (Note: We use ogr2ogr for gdb processing. It's interaction with the database takes place outside of the import transaction. Special handling during import and error recovery insures that artifacts of this processing are rolled back as well.)

The only artifact left in the database after an import failure is the upload record for this attempt.