USEPA / standardizedinventories

Standardized Release and Waste Inventories
MIT License
25 stars 16 forks source link

Inconsistent facility IDs when applying secondary contexts #149

Open bl-young opened 8 months ago

bl-young commented 8 months ago

Comparing results by FacilityID shows a number of differences for individual facilities.

import stewicombo
sc1 = stewicombo.getInventory('NEI_TRI_air_2017_v1.1.1_ff2903a')
sc2 = stewicombo.getInventory('NEI_TRI_air_seccntx_2017_v1.1.1_ff2903a')
sc1 = sc1.groupby(['FacilityID']).agg({'FlowAmount':'sum'})
sc2 = sc2.groupby(['FacilityID']).agg({'FlowAmount':'sum'})
s = (pd.merge(sc1, sc2, on='FacilityID')
     .rename(columns=rename_flow_cols)
     .assign(Rel=lambda x: x['m1'] / x['m2'])
     .query('Rel >= 1.01 or Rel <= 0.99')
     )

However the same review by FlowName does not.

import stewicombo
sc1 = stewicombo.getInventory('NEI_TRI_air_2017_v1.1.1_ff2903a')
sc2 = stewicombo.getInventory('NEI_TRI_air_seccntx_2017_v1.1.1_ff2903a')
sc1 = sc1.groupby(['FlowName']).agg({'FlowAmount':'sum'})
sc2 = sc2.groupby(['FlowName']).agg({'FlowAmount':'sum'})
s = (pd.merge(sc1, sc2, on='FlowName')
     .rename(columns=rename_flow_cols)
     .assign(Rel=lambda x: x['m1'] / x['m2'])
     .query('Rel >= 1.01 or Rel <= 0.99')
     )

This suggests to me that a different approach is being taken to handle the reported FacilityID. This has downstream ramifications for assigning sectors/NAICS in flowsa/USEEIO