CDCgov / cfa-config-generator

Apache License 2.0
0 stars 0 forks source link

Implement default epinow2 config generation script #3

Closed amondal2 closed 4 days ago

amondal2 commented 6 days ago

Implement a script that generates a set of configuration objects based on the default parameters in https://github.com/cdcent/cfa-nnh-pipelines/blob/33b4a55daba3479cddc85fe10dac4732b2f2c91b/NHSN/Rt/run_azure_batch/write_config.R

It should also accept arguments so that the workflow can be triggered manually with a different set of arguments than the default.

My thought was to adapt the YAML in that script to the sample config here.

This PR will just handle the JSON generation, but future work will validate and push timestamped versions to Blob Storage.

Some questions about the schema for @zsusswein @natemcintosh --

natemcintosh commented 6 days ago
  1. geo_type is almost always "state". However, at some point we'd also like to be able to run model at e.g. the county level, in which case geo_type would be "county". (Also when we do a model run for the whole US, it is not "state". I'm not sure of the exact name we use there, but it's probably something like "nation". As to the geo_value being an array, we'll have to ask @zsusswein.
  2. The report_date is usually just today, though sometimes we'd like to do retrospective testing, and then we'll change it. I believe as_of_date would be a timestamp, where report_date would just be a date. E.g. in python, as_of_date would be a datetime.datetime, and report_date is a datetime.date. @zsusswein is that correct?. reference_date is the day of the "event", e.g. hospitalization, ED visit date, etc. So a given report_date would probably have a bunch of reference_dates, where all(reference_date <= report_date) == True.
  3. I think UUID7 would be good for jobs, but after looking around, I've really only found one python library that can generate them, and it looks slightly half-baked, and has not been updated in a while. I think if we can find a good way to generate them, great, but otherwise we should consider some other UUID. In terms of task IDs, it would be useful to the person running the model to be able to see what particular pairings made up that task, e.g. Covid and the state of California => COVID-CA. That said, there may be other considerations I've forgotten about.
amondal2 commented 6 days ago

Thanks @natemcintosh ! This is helpful; I'll take a stab at a PR and we can iterate from there.