azgs / azlibrary_database

1 stars 1 forks source link

archive in db rather than file system? #26

Closed NoisyFlowers closed 5 years ago

NoisyFlowers commented 5 years ago

The loose coupling between the db and the archive directory could lead to problems down the road. For instance, I recently renamed an archive directory after azlibAdd had imported into it, thus breaking the azgs_path links. Look into the pros/cons of moving the tar.gz files into the db.

aazaff commented 5 years ago

I've been thinking about this, and I have a few thoughts.

Questions:

  1. Will doing so increase the likelihood of data loss or corruption?
  2. Will we be adding a significant level of io overhead that would make data service too slow?
  3. Should we blob the whole tar or possibly blob individual elements - e.g., each image? I like the whole tar idea better, but doing them separately might give us some more capability.

Pros:

  1. The more I think about it, I kind of want to not keep an azgs.json or any other type of metadata file within the archive folders and have the database be authoritative. I wouldn't feel comfrotable doing that und the current system, but I would feel okay doing it if the data was literally in the database within the collections table.
  2. Moving the data around or doing a data restore would LITERALLy just be a matter of doing a database dump, which I think is really robust and much safer.

Cons:

  1. I'm anxious to keep moving ahead on API work, but this really falls squarely under more database tinkering... is this the best use of our time?
aazaff commented 5 years ago

Additional Cons: Bytea has a 1GB per entry limit, which is too small, so we would have to use BLOB, but BLOB requires non-standard SQL interface - i.e., doesn't use INSERT, UPDATE, etc.

Additional Question: Are there good node libraries for working with BLOB commands?

aazaff commented 5 years ago

Additional Con: Presumably, we would no longer archive the original tar, so let's say that we incorrectly think that an upload went through correctly then later realize that there was a problem, there would be no file to restore/reupload from?

aazaff commented 5 years ago

We are now more or less committed to doing this because the metadata management benefits are pretty profound. We are researching the merits of BLOB format and any particular hangups that may be associated with it.

aazaff commented 5 years ago

Lol... we implemented this a long time ago now... I think as part of 42072e5a503acf2cc4facd64c046178e94b08aae