emory-libraries / aspace

0 stars 0 forks source link

SPIKE: Size of Data in ASpace #89

Closed AGCooper closed 1 year ago

AGCooper commented 1 year ago

Get a size/cost estimate (and the broad outlines of a plan for how these backups will be captured and deleted and the frequency of those changes). I can then take that plan to project sponsors for approval.

kbowaterskelly commented 1 year ago

This is not currently possible without implementing the actual backup process. We could safely base a cost estimate off a size estimate of less than 100gb at a rate of daily backup. My guess is that this will fall under automatic approval of under less than 1tb, at least for 3-6 months.

I have requested the size of the data in the spreadsheet from Elizabeth. I do not currently know how to check the size of the data in the SVN. Lyrasis is not able to provide me with the size of the database in ArchivesSpace at this time.

lovinscari commented 1 year ago

@kbowaterskelly - Can you please attach to this ticket any documentation you have from Lyrasis where they state they are unable to provide you with the size of the database so I can follow up with them?

kbowaterskelly commented 1 year ago

https://3.basecamp.com/3410311/buckets/29543127/messages/5823911757

Blake Carver - ArchivesSpace doesn't really use much space outside of the DB and Solr Indexes, so that doesn't show up on there anyways. I can just copy and paste what's in there for you.

Kaeln - Looks like they misunderstood my question when I asked them about disk utilization. I also requested the size of our database by directly evaluating the spreadsheet that contained it by Elizabeth, but did not get an answer. I'm still estimating this at below 1gb. We will get a true evaluation of the size that needs to be stored by implementing the backup process and running it once, then we can create a scheme and an estimate for a monthly, quarterly, or annual basis, see #95. This is the best / truest way, because any compression or similar that is applied is still unknown to us without seeing the end result output.

erussey commented 1 year ago

@kbowaterskelly : I responded to the thread above in basecamp clarifying the information you need from Lyrasis and asking for an estimate. We do not need the second spreadsheet you mentioned above...it is now loaded into ASpace test, so a size estimate of a total backup will include that data in the count.

AGCooper commented 1 year ago

Blake: "The backups for ArchivesSpace tend to be pretty small.

A backup of your current prod DB is about 9.1M, that's a gzipped mysqldump of the DB, which is how the backups are done.

Our very largest hosted site has a backup size of about 430M. Most of the other "large" sites are less than 1/2 that, between 100-200M."

kbowaterskelly commented 1 year ago

I've received enough information to estimate this as less than a terabyte for a year. I'd suggest implementation as-is of the daily backup Lyrasis has planned and retention of 2 years. This should be less than a hundred dollars a month.