ISISComputingGroup / IBEX

Top level repository for IBEX stories
5 stars 2 forks source link

Archiving: Create/use a central MySQL instance #5818

Open DominicOram opened 4 years ago

DominicOram commented 4 years ago

As a developer I would like to avoid instrument's disks filling up. From https://github.com/ISISComputingGroup/IBEX/issues/5789 we decided that we would do this by pushing data into a central MySQL instance. To do this we will need a central machine with enough disk space.

Acceptance Criteria

KathrynBaker commented 4 years ago

Initially use a single server, using one of the Archive Support Servers

FreddieAkeroyd commented 3 years ago

Bear in mid support for changing to archive appliance and if a linux server is better than windows etc.

rerpha commented 3 years ago

We could do #6645 while setting this up. If so a linux server is probably best

ChrisM-S commented 3 years ago

Are the “central MYSQL server”, a graylog server and the Archive appliance duplicating broadly the same functionality? Do we need to do all?!

FreddieAkeroyd commented 3 years ago

And do they all need to be equally resiliant?

FreddieAkeroyd commented 1 year ago

Have we confirmed this will work as expected? The ideal is a background task would periodically push and then delete records from the tables, however we need to be careful that we do not delete block records that have not yet been written to a nexus file (e.g. a long run over a shutdown) but if we filter them out we may leave "holes" in the table that do not get refilled. What we are wanting is that the space from deleted records gets re-used and not lost, but we just need to confirm this will happen in the use cases we do. It would be nice if any deleted row got re-used, but i assume the database becomes fragmented and this doesn't happen completely.

ChrisM-S commented 1 year ago

I don't have a mental picture of the schema, but I'm guessing time and/or run number would be keyed in so that on the instrument these could restrict block trimming i.e. if there was no later run number (or run time) for the data? If the main export (but not deletion) of records is done at the end of run to an sql "log" file, this could be moved via the normal data archive process - but only to somewhere like "stage-deleted", the central database could pick up the data and load itself from regularly from there. Trimming automatically on the instrument would be done on records already marked directly or by time/run as having been "saved." - and this could happen much later on (to say keep a 10GB maximum local database).