Closed nmdefries closed 3 weeks ago
I guess the test is failing (on linting, with delphi_nchs_mortality/pull.py:11:0: E0611: No name 'create_backup_csv' in module 'delphi_utils' (no-name-in-module)
) because the new fn is being added to delphi_utils
at the same time.
Also, tests for the new create_backup_csv
fn need to be added, but this is the idea for how this should work. Adding backups for other indicators should be faster.
Thanks for your quick feedback @minhkhul!
Add some logging to note on which indicator stashing is done
Agreed. Related to this, @korlaxxalrok suggested including metadata in each day's backup data or unique IDs we can use to track provenance of downstream data. Designing that will likely be too complex and thus take too long for getting V1 of data backups out, but could be very useful in the future.
Adjust the params.json.template in nchs_mortality as well.
I don't have strong feelings about this, but given the default the custom_run
param takes in the code means we don't necessarily need to add it to params.json
.
suggestion: When I wrote and run the script to stash nssp source similar to this on one, the small vm ran out of disk space at one point. To save disk space, apart from adding zipping, I also added a feature to check if there has been changes at all to the dataset in comparison to the latest past csv.gz on disk, and only save the latest new version of the dataset after confirming there's a difference. It's helpful on a weekly signal like nssp. I think it'd be nice to add that but not needed.
Hm, so we've found that saving data like this causes storage issues. Since you refer to a "vm", I wonder if the limit you hit was that of the VM (O(1 GB)) rather than with the host machine (O(100 GB)). How big is that entire collection of backups?
RE "only sav[ing] the latest new version of the dataset after confirming there's a difference" with the last backup, do we think this is safe/robust enough to do? One initial concern is that this is starting to sound like "archive differ V2". Of course, it's simpler than the current one, but any extra code increases the risk of introducing bugs. To know how to balance the risk, we'd want an estimate of how big the data backups would be.
Yep I very much agree with the potential for an archive differ v2 problem. Let's scratch that for now.
Also, been running this locally daily this since yesterday at the same time normal nchs run and keep the backup file, so we can take our time w this PR.
Description
Add
nchs-mortality
raw data backups and backup export utilityChangelog
create_backup_csv
fn indelphi_utils/export.py
nchs_mortality
'spull_nchs_mortality_data
fnAssociated Issue(s)
Context and writeup