dsgrid / dsgrid-project-StandardScenarios

Project instructions and configuration data for the dsgrid Standard Scenarios project
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

TEMPO BUG - Missing Counties #23

Open mooneyme opened 2 years ago

mooneyme commented 2 years ago

TEMPO is currently missing ~100 counties. This makes up about <1% of the population/households in scattered across the US (<1.3 million households).

This is most likely related to a bug in TEMPO. @ahcyip said "Last I remember, probably some bad join in Julia somewhere causing TEMPO to error out because some bin was blank". Apparently this was on Brian's radar, but they never got around to fixing it.

Short term fix: We apply a post-processing fix on the load_data_lookup.parquet to add these missing counties (and their combinations of other dimensions) with NULL data_id values.

Long term fix: @ahcyip or someone else on the TEMPO team is going to update the next version of the data handoff with this county bug fix resolved.

ahcyip commented 2 years ago

Great, I can handle this short term fix by fixing the parquet files by adding the missing rows,

ahcyip commented 2 years ago

The lookup with the short-term fix is now on Eagle at /shared-projects/dsgrid/tempo.lkup.parquet The long-term fix will be addressed in issue #28

mooneyme commented 2 years ago

Let's leave this open for reminder of the need for the long term fix.

ahcyip commented 2 years ago

Update: it's the smallest ~100 counties ~3% counties, only in the AEO Reference Case (no problems in EFS and LDV2035), where the data is "missing" and I'm filling the lookups with id = NA/null. These counties total to 50000 households, 0.03% total US. This is because with the lower sampling and low EV adoption, the stochastic simulation often does not pick up the EVs. In the future, I will look into increasing the sample rate so a small number of EVs will show up, but for now, the load is configured to be ~0 in these counties.