Database IDs and Object Storage IDs may run out of sync

dtitov commented 6 years ago

Test case:

Deploy LocalEGA.
Ingest a file. The record about a file appears in the DB with ID 1. Also, this file lands to the S3 as an object with the ID 1, because the ID of an object is taken from the DB.
Recreate DB instance. Not just restart it, but recreate.
Ingest one more file. E.R.: File is put to S3 as a separate file. A.R.: Existing file with ID 1 is overwritten in S3, because files table is dropped in DB and IDs sequence starts over from 1.

This happens because of several factors.

files table:

CREATE TABLE files (
    id             SERIAL, PRIMARY KEY(id), UNIQUE (id),
...

ID here is a simple auto-increment.

Database in our deployments doesn't have persistent volume (like inbox or s3). So each time one recreates the DB service, the tables are erased and numeration starts from the beginning.
File are put to the S3 by database IDs.

Possible fixes:

Use something like UUID-generation for IDs (e.g. uuid-ossp Postgress extension).
Introduce a persistent volume for database so the tables survive service recreation.
Don't put files to Object Storage by their database IDs, but better by some randomly generated UUID, along with inserting this UUID to the database to the corresponding file-record.

silverdaz commented 6 years ago

Possible solution is inside #339

juhtornr commented 5 years ago

Is this still valid?

NBISweden / LocalEGA