Closed zaneselvans closed 4 years ago
@swinter2011 were there any other data related issues that you encountered when you were testing the 2009-2010 EIA860 ETL? I feel like you mentioned them somewhere but... I don't know where that is.
Discovered that in the process of enforcing uniform types on the columns in the dataframes, we also ended up inadvertantly converting some NaN values into the string "nan" since that's what you get when you do str(np.nan)
. I patched a hack into pudl.transform.eia._occurances
in which those "nan" strings are turned back into true NaN values before the dropna()
is called. Pandas 1.0.0 will address a lot of these issues, with dedicated String, Boolean, and Integer column datatypes, all of which use the pandas.NA
value to indicate missing data.
This appears to be done now -- ETL completes successfully, but now we need to update entity mappings including 2009-2010 entities from EIA 860. Especially plants. See #529
ETL almost works for EIA 860, but the transform step is failing. Several things probably need to be fixed. The ones we know about:
primary_purpose_naics_id
containsnull
values, and so needs to beInt64
notint
.Harvesting of iso_rto_name is too inconsistent at 0.935.
iso_rto_lmp_node_id
consistency is only 0.852.balancing_authority_code
has a consistency of 0.0