OpenTreeOfLife / germinator

miscellaneous scripts and data for concerns that span more than one of the Open Tree code repositories: integration tests, system statistics, etc.
BSD 2-Clause "Simplified" License
21 stars 7 forks source link

Someone should back up files.opentreeoflife.org #122

Closed jar398 closed 7 years ago

jar398 commented 7 years ago

Maybe in a few months there will be a more permanent solution for storing the files that are currently on files.opentreeoflife.org, such as on S3, but until then, backups of this content would be very wise. I am backing it up to my laptop from time to time, but not systematically.

I'm talking about the entire contents of the ~opentree/files.opentreeoflife.org tree on files.opentreeoflife.org (which is a CNAME for varela.csail.mit.edu).

Perhaps @mtholder is already doing this, I don't know. In any case, we need to find a volunteer, and they should set up a script to rsync the content once a day or so to a server of their choice.

It seems to be about 11G of stuff at present. It includes all archived versions of OTT and the synthetic tree, as well as archived versions of the inputs to OTT.

jar398 commented 7 years ago

(The files are currently on a 2TB spinning disk, which, like all such disks, could fail at any time. MTBF is about 5 years and it's two or three years old.)

jar398 commented 7 years ago

Assigning to @mtholder hoping he can take care of this; if not, then de-assign yourself and we'll figure something else out.

mtholder commented 7 years ago

I'm backing it up now. I can work on setting up the rsync. I can just skip the synthesis output directories if those are just unpacked versions of the tar'ed form. Is that correct?

It looks I should have started this by cloning https://github.com/OpenTreeOfLife/files.opentreeoflife.org

Oops. I'll do that after the copy is complete.

jar398 commented 7 years ago

Correct. I believe that all the unpacked output and taxonomy directories are redundant with the tarballs; if they're not, then I have mismanaged the unpacked directories and any differences in the unpacked directories should be ignored.

I'm not sure all changes to the small files in files.opentreeoflife.org have been written back to the git repo. Nothing really depends on these being kept in sync. In fact most of the small files should probably be deleted, in favor of apache-generated directory listings.

The total is about 10G (as you've probably found out).

Thanks for taking care of this.