CDCgov / cfa-rt-postprocessing

Apache License 2.0
0 stars 0 forks source link

Build merged_draws.parquet and merged_summaries.parquet #7

Open kgostic opened 2 weeks ago

kgostic commented 2 weeks ago

@zsusswein will review

natemcintosh commented 2 weeks ago

So, going off the output structure docs, the raw output will have form

az://rt-output/
├── job_<job_id>/
│   ├── raw_samples/
│   │   ├── samples_<task_id>.parquet
│   ├── summarized_quantiles/
│   │   ├── summarized_<task_id>.parquet
│   ├── diagnostics/
│   │   ├── diagnostics_<task_id>.parquet
│   ├── tasks/
│   │   ├── task_<task_id>/
│   │   │   ├── model.rds
│   │   │   ├── metadata.json
│   │   │   ├── stdout.log
│   │   │   └── stderr.log
│   ├── job_metadata.json

I guess I'm slightly unclear on the difference between these merged_... files and the "production database".

natemcintosh commented 2 weeks ago

Assuming the section in this PR updating the README on "outputs" is up to date, I think I understand how this all fits together.