USEPA / standardizedinventories

Standardized Release and Waste Inventories
MIT License
25 stars 16 forks source link

Changing INVENTORY_PREFERENCE_BY_COMPARTMENT yields different totals #129

Closed vlahm closed 1 year ago

vlahm commented 1 year ago

Hello,

Am I right in thinking the INVENTORY_PREFERENCE_BY_COMPARTMENT parameter controls which inventory takes precedence when there is overlap? If so, changing this parameter shouldn't result in different flow totals--only different allocations of flow between inventories. The code and comments below demonstrate that changing the INVENTORY_PREFERENCE_BY_COMPARTMENT parameter currently causes different total (sum) flows to be returned.

# in stewicombo/globals.py:

INVENTORY_PREFERENCE_BY_COMPARTMENT = {"air": ["eGRID", "GHGRP", "NEI", "TRI"],
                                       "water": ["DMR", "TRI"],
                                       "soil": ["TRI"],
                                       "waste": ["RCRAInfo", "TRI"],
                                       "output": ["eGRID"]}

---

# separate script:

from stewicombo import combineFullInventories

cmb = combineFullInventories({'TRI':2015, 'NEI':2015, 'DMR':2015}, filter_for_LCI = False)
cmb['FlowAmount'].sum()

# result: 66912900555.14437

---

# back in stewicombo/globals.py:

INVENTORY_PREFERENCE_BY_COMPARTMENT = {"air": ["TRI", "NEI", "eGRID", "GHGRP"],
                                       "water": ["TRI", "DMR"],
                                       "soil": ["TRI"],
                                       "waste": ["TRI", "RCRAInfo"],
                                       "output": ["eGRID"]}

# reinstall StEWI-1.0.5

cmb = combineFullInventories({'TRI':2015, 'NEI':2015, 'DMR':2015}, filter_for_LCI = False)
cmb['FlowAmount'].sum()

# result: 67048867763.67558
vlahm commented 1 year ago

Digging a bit deeper, it appears that in cases where flows are reported to multiple inventories for the same year and location, stewicombo is designed to return flows only from the inventory of highest preference. But what if a facility reports 100 kg of formaldehyde to TRI and 10 kg via DMR, and DMR takes precedence in INVENTORY_PREFERENCE_BY_COMPARTMENT? Then we would want stewicombo to return 10 kg for DMR and 90 kg for TRI.

Is there a way to achieve this result?

bl-young commented 1 year ago

Digging a bit deeper, it appears that in cases where flows are reported to multiple inventories for the same year and location, stewicombo is designed to return flows only from the inventory of highest preference.

Yes, your interpretation is correct. Right now stewicombo is not able to do the manipulation you describe (essentially taking the max across datasets). So I am not surprised that different totals are returned depending on the inventory preference.

I'll add that the overlaphandler is currently being refactored and should be updated in the next few weeks, though this feature is not planned to be included.

bl-young commented 1 year ago

Closing this issue in favor of enhancement in #130