cca / cca_invenio

CCA's InvenioRDM repository
0 stars 0 forks source link

Cloud storage setup #2

Open phette23 opened 1 year ago

phette23 commented 1 year ago

See Invenio Docs > Customize > S3 as Storage Backend. We would have to use a GSB in AWS S3 compatible mode) for files.

There have been some bug reports in Discord that make it seem like cloud storage is not quite there yet or is not a priority of the project's main drivers.

Storage Class

Storage class costs in us-west1: $0.020 per GB-month for Standard, 0.0012 - $0.020 per GB-month for Autoclass with $0.025 per 10,000 objects per month Autoclass management fee. Many, small objects is less cost effective with how the Autoclass fee is structured.

It would be good to know if Autoclass pays off or not. Could we estimate this with VAULT data? Look at quantity, size, and last access date of files in storage. Exclude derivatives (does Invenio make derivatives?) in _THUMBS directories.

Autoclass might not be compatible with Invenio's file integrity check which regularly accesses files, stopping them from being downgraded to a slower class. To confirm, we would need to turn on Google Cloud audit logs for the parent project because the access timestamp of files is not stored anywhere but the logs capture that. During local tests, runs of the file check do not appear to change the file's accessed time, so perhaps this is not a concern.

phette23 commented 1 year ago

It was quite easy to run the app locally and use GSB. I didn't perform extensive testing and there are a few questions, but cloud storage seems viable.