globaldothealth / monkeypox

Mpox 2022 repository
Other
175 stars 36 forks source link

S3->Github data replication broken #141

Closed abhidg closed 2 years ago

abhidg commented 2 years ago

Is your feature request related to a problem? Please describe. Fetching archives from S3->GitHub is currently broken, which will break monkeypox-report generation as well. There needs to be a way for users to access archived snapshots of the monkeypox line list for reproducibility purposes. Currently archives are stored in a private S3 bucket whose contents are fetched by a script. Everything other than the cached source URLs are fetched.

Proposed solution Make the monkeypox bucket public

Additional context This will make sources also public, which can contain PII. We provide the source URLs in the bucket already, so there can be a case to be made that we are archiving for transparency reasons, but runs risk of potentially hosting PII ourselves. Thoughts @Mougk?

Alternative solution Move sources fetching to a separate script which puts in a private monkeypox-sources bucket, remove sources from the monkeypox bucket and make it public.

jim-sheldon commented 2 years ago

I made a website that exposes archives for downloads (here), so we can leave permissions alone.

abhidg commented 2 years ago

Thanks @jim-sheldon :tada:

jim-sheldon commented 2 years ago

Reusing this issue to track the bug and fix