Closed bendnorman closed 1 year ago
I'm a little confused by the asset statement on 251. It's checking to make sure a project doesn't have multiple locations but projects
is used to create two location columns for a project.
Also, when I remove the asset statement, the ETL finishes without any db constraint failures.
That assert is checking that there aren't more than 2 locations for any project, because the wide format data creates county_1
and county_2
columns. I can check what project is turning up with more than 2 locations and try to work backwards to figure out why that happened.
I don't know why this only happened with the PUDL update, but there was one project that got split into 4 entries because bad location information -> geocoding failure -> null locations that didn't get dropped. I never got to the root cause, but it's only one project so I just fixed it manually.
Thanks for fixing it! It could have been a pandas 1.4 change that broke something. Should we merge it in?
This PR updates the MCOE table to use data from 2021.
pudl.sqlite
is now pulled from AWS instead of Zenodo because it's 1) faster 2) easier to specify the version and 3) doesn't download all of the PUDL code and data.@TrentonBush I'm getting a data validation error on line 251 in
dbcp.data_mart.projects
. Looks like there are more than 2 locations found for 2 projects. I'm not sure why this changed because this PR doesn't touch the project data. Maybe the pandas update changed the behavior of a function?