Open zschira opened 1 year ago
Thanks @zschira for taking a look at our changes to the crosswalk and laying out these next steps!
I think these all make sense in terms of getting to a PR. The other part of this is whether a PR makes sense for some or all of the changes. Here, a response to your first question I think would help us have the required conversation with the maintainers of the upstream repo.
I think (1) makes a lot of sense for a PR, there are some finer points around use of Early Release data and the switch to monthly generator data but I think we can work out something mutually agreeable on both.
I'm more ambivalent about (2). Some of the logic of the multi-year crosswalk has diminished as we no longer attempt to allocate CAMD units to EIA generators, and we have better downstream methods for determining the prime mover and fuel codes to associate with CAMD units. This points to the fact that this functionality is separable from the matching logic of the crosswalk. Unless we believe that others would be interested in it, I'm not sure it's worth the added effort of putting it into a PR. Also personally, I would rather have it in Python, so would argue for any version of this we do want to live in PUDL or elsewhere, and so is maybe better part of the broader CEMS crosswalk work we've been discussing with Catalyst and OGE.
(3) is minor and fine to include. I would probably not include (4) in the PR. One other category of changes that we should revert are changes to column names.
Putting this on hold in favor of updating the crosswalk to use 2021 EIA data instead of 2018.
Background
RMI has been working on updates to the EPA crosswalk notebook here. I'm collecting some initial thoughts on what they should do to prep this for a PR back into the original EPA repo. This should probably be moved somewhere else, but putting it here for now.
Tasks before PR
Overall looks quite manageable to get this ready for a PR. Many changes are fairly minor, and don't need much work.
Cleanup
eia_860_year_num
,eia_860_year_file_name
, andeia_860_year
, which are all very similar in name/value, which can be confusing.&
with&&
.&
is vectorized and can create confusion,&&
is more appropriate in most casesPossible problems
eia_data_file_hist
. This variable is assigned inside anif
statement, but used outside of itDocumentation
Needs from RMI
Many of the above tasks can be handled by Catalyst, but it would be helpful to get some inputs from RMI.