Closed TrentonBush closed 3 weeks ago
@TrentonBush we had mentioned wanting to split this up into the two major output tables, do you still think that's a good idea?
Sounds like this is mostly sorted, though there are still some questions about the output schema. Once those are resolved, should we close this issue @TrentonBush ?
Because the 860m data comes from PUDL, much of the ETL code will live in PUDL but some application-specific code will be in this repo. Where should that boundary be?
Output Requirements from PUDL:
Output Requirements from Application Code:
project_id
,timestamp
) and one columnproject_status
project_id
and columns corresponding to various status changes, egstatus_1_to_status_2
, and values containing the date of that status transition.Other Considerations: Because PUDL is a bigger and more complex project, it is slower moving: the time to create satisfactory code and get PRs through the queue can be long. For this reason, I think we should start with as much prototyping as possible in this repo until we get the right outputs. Then we can refactor/move useful transformations back upstream to PUDL without deadline pressure.
This repo will also need to be updated to use the latest PUDL database. We can also remove the dependency on pudl's codebase.