COG-UK / dipi-group

Data integrity and pipeline integration working group
4 stars 1 forks source link

S3 files not updated in several days #211

Closed AngieHinrichs closed 1 year ago

AngieHinrichs commented 2 years ago

It looks like cog_all_fasta.gz hasn't been updated since the 6th, can someone take a look?:

curl -SsI https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_all.fasta.gz | g Last
Last-Modified: Tue, 06 Sep 2022 02:44:50 GMT

Thanks! Angie

WhalleyT commented 2 years ago

Hi @AngieHinrichs , I think there have been issues with S3 sync on my end being timed out. I think we've got this sorted but I'll report back shortly!

AngieHinrichs commented 2 years ago

Working great now, thanks @WhalleyT!

BTW cog_all.fasta.gz is ~12.5GB now, but when xz-compressed it's only 188MB so that's what I use locally. I imagine the trans-Atlantic downloads would go more quickly with that too. :) xz is slower than gz, but you can throw a lot of cores at it. FWIW on my server this takes <12 minutes, and it really does keep 20 cores busy so I could probably add more to make it faster still:

time zcat cog_all.fasta.gz | xz -T 20 > cog_all.fasta.xz

real    11m40.390s
user    237m9.873s
sys     1m0.477s