FreeUKGen / FreeUKRegProductIssues

Repo for user-reported FreeUKReg product issues
2 stars 0 forks source link

For discussion: long term image storage policy / practice #1110

Closed PatReynolds closed 3 years ago

PatReynolds commented 6 years ago

Suggest P0 until requirement is established. Following policy group decision on image storage policy (i.e. keep in perpetuity) we should think about how to implement this in a cost-effective way.

PatReynolds commented 6 years ago

We are buying space on AWS servers called 'glacier' - it is dead cheap storage, with a significant cost if you have to retrieve. We could place copies of the images there at time of creation, and recover if needed (and no longer on 'live' image system or obtainable from another source at less cost - e.g. looking at an online or offline copy.

Captainkirkdawson commented 6 years ago

Not sure what is expected of the developers of FR would it not be better under coordination?

edickens commented 6 years ago

To start the discussion. There are millions of images which have been transcribed and may never be accessed again. But they are/may be needed for checking transcriptions. Some, due to access restrictions, may never be shown to researchers. They are held in expensive instant storage. They could be held elsewhere, but have a slower access time. So the server needs to "Archive" these images and have a means to call them up when needed.

PatReynolds commented 3 years ago

Transfer to Backblaze.

@PatReynolds to check with Lemon what the timescale for recovery would be, should it be needed (not a show stopper).

PatReynolds commented 3 years ago

wrote to @lemon-ukgen 29 Apr 2021

lemon-ukgen commented 3 years ago

Is this "expensive instant storage" the FreeREG data on the image servers (langur andmandrill, aka images{4,5}.freereg.org.uk)? Do we know which paths, so I can calculate the size of data we're talking about?

Is this a static set of images, or will we be adding to it over time?

The total of all FreeREG image data -- on these image servers -- is around 1.4TB, and it resides on 14TB servers with ~4TB free.

The total image data for all projects has barely grown over the last 5 years. From a sysadmin perspective there's no immediate pressure to free up space here unless storage patterns change.

Captainkirkdawson commented 3 years ago

Personally I see no point at this time

PatReynolds commented 3 years ago

Resolved to make no change.