Closed mlissner closed 3 years ago
Oh, I also saw some hints about s5cmd being faster, written in Go, etc. Might be worth checking out in a pinch next time.
I heard from colleagues today that the files were not publicly accessible. It looks like this is because the cp
command resets their permissions. Annoying, but I was able to easily fix this via the console. I haven't double checked this at all, but I believe in the future I should use --acl public-read
when doing the cp
to prevent this.
I also found that a useful command for testing things is something like:
aws s3api list-objects \
--bucket 'xxx' \
--prefix us/federal/judicial/financial-disclosures/ \
--query 'Contents[?StorageClass!=`STANDARD`]'
That'll show you files in a bucket/prefix that aren't STANDARD storage class. There are some additional tips here too.
The AWS console makes it very easy, cheap, and fast to move things to deep glacier storage, but makes it very hard, expensive, and slow to restore them. I made the mistake on Wednesday of moving a directory that I thought only had old data into deep glacier.
The process for restoring it has been annoying. First, you have to make restore requests for every object. This is weirdly slow going, and you have to say how long you want it to be restored for. The command I used for that was:
This took several hours to run. There are guides online about parallelizing it, but I didn't bother. I also forgot to set the
--restore-days
parameter, which sets how long you want the restored data to stick around for. The default number of days isn't documented.Then you wait up to 12 hours as AWS does the restore. Fine.
Finally, the items are accessible again, BUT it's only a copy that's around temporarily. To make that permanent, you have to copy it in place with something like: