USEPA / ElectricityLCI

Creative Commons Zero v1.0 Universal
24 stars 10 forks source link

NETL flowmapping - migrate to fedelemflowlist #66

Closed WesIngwersen closed 4 years ago

WesIngwersen commented 4 years ago

@jump2conclusionsmatt . I see there is a flow mapping file Elementary_Flows_NETL.csv that was being used in upstream_dict but commented out.

The way flowmapping is working for the FEDEFL is a mapping in a standard format to the flowmapping folder, like NETL.csv, which makes it accessible via get_flowmapping(source=NETL), and applying it from there. So I propose moving the data in `Elementary_Flows_NETL.csv' to an NETL.csv mapping file in the standard FlowMapping format to add to the fedelemflowlist repository. I can ask for assistance in working with you on the mapping.

WesIngwersen commented 4 years ago

As an update I found this mapping file is active netl_fedelem_crosswalk.csv so I assume this one is actually the latest. @jump2conclusionsmatt please confirm

m-jamieson commented 4 years ago

Yeah, the netl_fedelem_crosswalk.csv is the latest.

It seems like translating the existing crosswalk into the fedefl format should be pretty straightforward.

WesIngwersen commented 4 years ago

@jump2conclusionsmatt. The converted net_fedelem_crosswalk file in standard flow mapping format is attached. fedmapping_for_netl.xlsx There were some issues with this conversion. In the original file there are duplicate netl_flows with at least the same name and context. If the capitalization of all was the same, the duplicates were removed. If not they stayed in. I have a feeling there were duplicates because there was additional information missing. For instance, there was not netl uuid or netl unit to be used here in the mapping.

WesIngwersen commented 4 years ago

Code I used for the conversion

import pandas as pd from electricitylci.globals import data_dir

netl_fedelem_crosswalk = pd.read_csv(data_dir+'/netl_fedelem_crosswalk.csv')

netl_fedelem_crosswalk.columns

netl_fedelem_crosswalk['SourceListName'] = 'NETL'

mapping = {"FlowName_netl":"SourceFlowName", "Compartment_path_netl":"SourceFlowContext", "FlowName":"TargetFlowName", "Unit":"TargetUnit", "FlowUUID":"TargetFlowUUID", "Context":"TargetFlowContext" }

fedmapping_for_netl = netl_fedelem_crosswalk.rename(columns=mapping)

import fedelemflowlist from fedelemflowlist.globals import flowmapping_fields

flowmapping_fields = list(flowmapping_fields.keys())

cols_not_in_flowmapping_fields = set(fedmapping_for_netl.columns) - set(flowmapping_fields)

fedmapping_for_netl = fedmapping_for_netl.drop(columns=cols_not_in_flowmapping_fields)

cols_to_be_added_to_mapping = set(flowmapping_fields) - set(fedmapping_for_netl.columns)

for c in cols_to_be_added_to_mapping: fedmapping_for_netl[c]=None

fedmapping_for_netl = fedmapping_for_netl[flowmapping_fields]

fedmapping_for_netl = fedmapping_for_netl.drop_duplicates() len(fedmapping_for_netl)

4673

fedmapping_for_netl.to_excel('fedmapping_for_netl.xlsx',index=False) fedmapping_for_netl.to_pickle('fedmapping_for_netl.pk') fedmapping_for_netl.to_csv('fedmapping_for_netl.csv',index=False)

m-jamieson commented 4 years ago

Actually, the most likely reason for the duplicates is that I didn't bother checking for them as I was dumping them in. The list was constructed by taking the results from various models that were either in spreadsheets or the form of gabi results and pasting them in at the bottom. I figured it wouldn't be a big deal for whatever script to have to deal with the duplicates. The intention was to hopefully get some time towards the end of the project or the future to have someone go back and take a look.

Now what may be a problem is if there are multiple flows with the same name and they didn't get assigned to the same fedelem flow or at least I would be surprised by that.

WesIngwersen commented 4 years ago

I believe we can close this but it still needs confirmation @bl-young

bl-young commented 4 years ago

With the updates to v 1.0.1, it might be good to check that the upstream data you are using here still matches in the same was as is being done for the upstream models. But because the mapping I did was only for coal and gas, there may be other flows from other supply chains. @jump2conclusionsmatt we may want to to follow up, as I dont recall the source data for the other supply chains

m-jamieson commented 4 years ago

Seeking opinions on how to handle some of these crosswalk issues... I'm currently working through modifications to upstream_coal.py. For the mining emissions I'm pulling in fresh data from the fedcommons version of the coal model with flows already mapped to the fedefl. The transportation emissions are not. I can either rename the flows in Coal_model_Basin_and_transportation_inventory.xlsx directly (I could even keep the crosswalk in the file), or I could create a new fedefl mapping. If I do the mapping, there are merges that would be required in both fedefl and here. I guess there's some value in catching all kinds of ways people name their flows, but that's more stuff to keep up to date.

On a related note, should I map flows within each module that generates an inventory (solarPV, geothermal, solar thermal, etc.) or do it all at one time, as it's currently done?

bl-young commented 4 years ago

In my mind I think it'd be easier to track if each inventory source had its own mapping (e.g. nuclear, solar pv, solar thermal, construction, petroleum). Because the get flow mapping function can pull multiple or even all flow mappings at once it's not really an issue to separate them. This might make updates in the future easier as well.

bl-young commented 4 years ago

This is now resolved with the eLCI mapping file in FEDEFL