Integrate e-mission-common's CO2 footprint and energy emission calculation into Public Dashboard

iantei commented 3 weeks ago

Currently, Custom label make use of [label_options](https://github.com/e-mission/nrel-openpath-deploy-configs/tree/main/label_options) to extract the CO2 emission calculations while there is no energy emission available.
Use the e-mission-common to extract the CO2 footprint and energy emission calculation.

shankari commented 3 weeks ago

I am not sure what you mean by "there is no energy emission available". We do in fact compute the energy consumed in the public dashboard.

iantei commented 3 weeks ago

Yes, we compute the energy consumed in the public dashboard, but we don't display the energy consumption for ones which have custom label.

We have "Timeseries of energy" metric available for study/program like nrel-commute which uses default labels, i.e. does not have label_options. However, for the study/program like usaid-loas-ev-openpath which uses custom labels, we are not enlisting the "Timeseries of energy" metric since we just have richMode {"value":"walk", "baseMode":"WALKING", "met_equivalent":"WALKING", "kgCo2PerKm": 0}, which does not have information about energy calculation in kWH.

iantei commented 3 weeks ago

The current computation of footprint i.e. CO2 and energy emission in the public dashboard makes use of distance parameter. While the computation of footprint in e-mission-common requires trip as a parameter calc_footprint_for_trip(trip, mode_label_option) source code. I am trying to understand how can we pass trip as a parameter instead of distance which is a column in the dataframe.

def CO2_footprint_default(df, distance, col):
    """ Inputs:
    df = dataframe with data
    distance = distance in miles
    col = Replaced_mode or Mode_confirm
    """

    conversion_lb_to_kilogram = 0.453592 # 1 lb = 0.453592 kg

    conditions_col = [(df[col+'_fuel'] =='gasoline'),
                       (df[col+'_fuel'] == 'diesel'),
                       (df[col+'_fuel'] == 'electric')]
    gasoline_col = (df[distance]*df['ei_'+col]*0.000001)* df['CO2_'+col]
    diesel_col   = (df[distance]*df['ei_'+col]*0.000001)* df['CO2_'+col]
    electric_col = (((df[distance]*df['ei_'+col])+df['ei_trip_'+col])*0.001)*df['CO2_'+col]

    values_col = [gasoline_col,diesel_col,electric_col]
    df[col+'_lb_CO2'] = np.select(conditions_col, values_col)
    df[col+'_kg_CO2'] = df[col+'_lb_CO2'] * conversion_lb_to_kilogram
    return df

For the default label mapping, we are dependent on the energy_intensity.csv and mode_labels.csv - which does not have the required second parameter for baseMode. Since https://github.com/JGreenlee/e-mission-common/blob/master/src/emcommon/resources/label-options.default.json is added into the e-mission-common repo, would it be a good idea to use this label-option even when label-option is not specified for the program/study in the config file?

iantei commented 3 weeks ago

We have trip information available in the column of the data frame.

Maybe we can create a dictionary in the required parameter format, and pass into e-mission-common for footprint calculations. Sample trip format from the test_footprint_calculations

        fake_trip = {
            'distance': 10000,
            'start_fmt_time': '2022-01-01',
            'start_loc': {'coordinates': [-74.006, 40.7128]}
        }

iantei commented 3 weeks ago

Trying to integrate emcommon.metrics.footprint.footprint_calculations with the following changes in environment26.dashboard.additions.yml fiel

...
dependencies:
- pip:
  ...
  - git+https://github.com/JGreenlee/e-mission-common@master

Got the below issue -

---> 73 async def get_egrid_region(coords: list[float, float], year: int) -> str | None:
     74     """
     75     Get the eGRID region at the given coordinates in the year.
     76     """
     77     if year < 2018:

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

This is likely due to the support for Python 3.10 used, which dashboard still uses Python 3.9.

And while trying to use the e-mission-common@0.5.5, got the following error -

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/footprint_calculations.py:63, in calc_footprint_for_trip(trip, mode_label_option)
     61 mode_footprint = rich_mode['footprint']
     62 if 'transit' in mode_footprint:
---> 63   mode_footprint = get_mode_footprint_for_transit(trip, mode_footprint['transit'])
     64 kwh_total = 0
     65 kg_co2_total = 0

NameError: name 'get_mode_footprint_for_transit' is not defined

This is strange because I assigned previous tag i.e. 0.5.5, which still has the function defined as get_mode_footprint_for_transit() while the master makes use of get_transit_intensities_for_trip()

While this gets fixed, I will explore how to get access to the trip data and baseMode, which are the required parameter of the function calc_footprint_for_trip.

iantei commented 2 weeks ago

@JGreenlee Instead of using the @master tag for the e-mission-common, I approached to use : git+https://github.com/louisg1337/e-mission-common@master which resolved the issue of TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

I incorporated the following code changes from https://github.com/JGreenlee/e-mission-common/blob/master/test/metrics/test_footprint_calculations.py test in my Jupyter notebook:

fake_trip = {
'distance': 10000,
'start_fmt_time': '2022-01-01',
'start_loc': {'coordinates': [-74.006, 40.7128]}
}
fake_mode = {'base_mode': 'BUS'}
footprint_energy, footprint_co2 = await emffc.calc_footprint_for_trip(fake_trip, fake_mode)

I am getting the below issue -

get_transit_intensities_for_uace(year, uace, modes, metadata):
     ---> 43 actual_year = intensities_data['metadata']['year']

TypeError: 'NoneType' object is not subscriptable

It seems to lookup for data in previous year than 2022, and eventually fails after reaching to 2018. Is there any issue with my approach, or should there be better error handling on the calculations side?

Details of the issue:

``` DEBUG:root:Getting footprint for trip: {'distance': 10000, 'start_fmt_time': '2022-01-01', 'start_loc': {'coordinates': [-74.006, 40.7128]}}, with mode option: {'base_mode': 'BUS'} DEBUG:root:Getting rich mode for label_option: {'base_mode': 'BUS'} DEBUG:root:Rich mode: {'icon': 'bus-side', 'color': '#9240a4', 'met': {'ALL': {'range': [0, inf]}}, 'footprint': {'transit': ['MB', 'RB', 'CB']}} DEBUG:root:Getting mode footprint for transit modes ['MB', 'RB', 'CB'] in trip: {'distance': 10000, 'start_fmt_time': '2022-01-01', 'start_loc': {'coordinates': [-74.006, 40.7128]}} DEBUG:root:Getting mode footprint for transit modes ['MB', 'RB', 'CB'] in year 2022 and coords [-74.006, 40.7128] DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): geocoding.geo.census.gov:443 DEBUG:urllib3.connectionpool:https://geocoding.geo.census.gov:443 "GET /geocoder/geographies/coordinates?x=-74.006&y=40.7128&benchmark=Public_AR_Current&vintage=Census2020_Current&layers=87&format=json HTTP/1.1" 200 4978 DEBUG:root:Getting mode footprint for transit modes ['MB', 'RB', 'CB'] in year 2022 and UACE 63217 WARNING:root:ntd data not available for 2022. Trying 2021. WARNING:root:ntd data not available for 2021. Trying 2020. WARNING:root:ntd data not available for 2020. Trying 2019. WARNING:root:ntd data not available for 2019. Trying 2018. ERROR:root:eGRID lookup failed for 2018. --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[5], line 8 2 fake_trip = { 3 'distance': 10000, 4 'start_fmt_time': '2022-01-01', 5 'start_loc': {'coordinates': [-74.006, 40.7128]} 6 } 7 fake_mode = {'base_mode': 'BUS'} ----> 8 footprint_energy, footprint_co2 = await emffc.calc_footprint_for_trip(fake_trip, fake_mode) 9 print(f"\n {footprint_energy}, {footprint_co2} \n") File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/footprint_calculations.py:44, in calc_footprint_for_trip(trip, mode_label_option) 42 mode_footprint = dict(rich_mode['footprint']) 43 if 'transit' in mode_footprint: ---> 44 (mode_footprint, transit_metadata) = await emcmft.get_transit_intensities_for_trip(trip, mode_footprint['transit']) 45 merge_metadatas(metadata, transit_metadata) 46 kwh_total = 0 File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/transit.py:22, in get_transit_intensities_for_trip(trip, modes) 20 year = util.year_of_trip(trip) 21 coords = trip["start_loc"]["coordinates"] ---> 22 return await get_transit_intensities_for_coords(year, coords, modes) File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/transit.py:30, in get_transit_intensities_for_coords(year, coords, modes, metadata) 28 metadata.update({'requested_coords': coords}) 29 uace_code = await util.get_uace_by_coords(coords, year) ---> 30 return await get_transit_intensities_for_uace(year, uace_code, modes, metadata) File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/transit.py:43, in get_transit_intensities_for_uace(year, uace, modes, metadata) 40 Log.debug( 41 f"Getting mode footprint for transit modes {modes} in year {year} and UACE {uace}") 42 intensities_data = await util.get_intensities_data(year, 'ntd') ---> 43 actual_year = intensities_data['metadata']['year'] 44 metadata.update({ 45 "data_sources": [f"ntd{actual_year}"], 46 "data_source_urls": intensities_data['metadata']['data_source_urls'], (...) 51 "ntd_ids": [], 52 }) 54 total_upt = 0 TypeError: 'NoneType' object is not subscriptable ```

JGreenlee commented 2 weeks ago

I have located the issue. It is because only *.py files are being included when emcommon is bundled as a package. Therefore, the resources folder and all its .json files are missing.

I think I need to adjust the pyproject.toml

iantei commented 2 weeks ago

Update:

The calc_footprint_for_trip(trip, mode) is an async function.

Tried approaches to call this sync function:

Called await calc_footprint_for_trip(trip, mode) directly from the Jupyter notebook, which works perfectly fine.
We use footprint calculation in energy_calculations.ipynb notebook. This has a function add_energy_impact() in scaffolding.py , which is synchronous function. We need to make call for calc_footprint_for_trip(trip, mode) from here.
- We can't use await calc_footprint_for_trip(trip, mode) from within the add_energy_impact() because it gives an error of await only allowed within _async_ function
- We can't use asyncio.run(calc_footprint_for_trip(trip, mode)) because it gives an error - asyncio.run() cannot. be called from a running event loop.
- Well, I changed the add_energy_impact() function to async and used await to call both the add_energy_impact() from Jupyter notebook, and await to call calc_footprint_for_trip(trip, mode) from calc_footprint_for_trip() function. This way we can call the async function calc_footprint_for_trip(trip, mode). Is there any concern with this approach?

iantei commented 2 weeks ago

As discussed, changing add_energy_impact() to async function makes it convenient to use await to make call from energy_calculations notebook. And this approach looks good. Next thing, I want to explore how to figure out the baseMode associated with the particular mode of commute.

iantei commented 2 weeks ago

We currently have baseMode only available for list of Mode, and not Replaced Mode. However, when we are computing the energy and CO2 footprint, we are calculating the energy impact with df['Energy_Impact(kWH)'] = round((df['Replaced_mode_EI(kWH)'] - df['Mode_confirm_EI(kWH)']),3), likewise with CO2_Impact. Even though the list of keys in Mode and Replaced Mode are identical, that's not always the case. Therefore, we need baseMode also available for Replaced Mode so that we can compute Energy_Impact and CO2_Impact for Replaced Mode too.

JGreenlee commented 2 weeks ago

Even though the list of keys in Mode and Replaced Mode are identical, that's not always the case.

In what instances are there a Replaced Mode that does not have a Mode by the same key?

I thought that Replaced Modes were always a subset of Modes

iantei commented 2 weeks ago

In what instances are there a Replaced Mode that does not have a Mode by the same key? I thought that Replaced Modes were always a subset of Modes

You're correct! There is only a Replaced Mode - No_travel which is different from the list of Mode. No_travel does not need computation of footprint. This should be fine.

Abby-Wheelis commented 3 days ago

One idea to cut down on the wait times to map from mode_confirm to baseMode : To get the mapping from mode_confirm to baseMode we could extract the mapping once (get the unique mode_confirm list and generate a local mapping) and then we can use the local mapping to apply to the whole dataframe synchronously, and are only waiting on the call to emcommon once for each mode_confirm not once for every row (could be 1000s)

iantei commented 3 days ago

@Abby-Wheelis I think you'd posted a discussion note here. I am unable to see it.

Abby-Wheelis commented 3 days ago

re-writing from memory since GitHub seems to have eaten what I wrote yesterday, @iantei feel free to add if you remember any additional points

There are two general approaches that we could take here:

1) use the list of trips

might be easier to pass to the function since the format is what is expected
still need to extract the mode and lookup the base mode
iterating over a list seems slow
still need to convert to a dataframe for the plotting functions and ensure all the same filtering gets applied 2) use a dataframe of trips
would need to massage the data structure into the format expected to pass into function
need to extract the mode and look up the base mode (should add base mode as a column, since we want to use it for things like filtering AIR regardless
could use something like asyncio.gather() to speed up the iteration while applying the async footprint lookup
allows for filtered dataframe to stay the same (just with more info) and be ready for plotting

Both @iantei and I and leaning towards option 2 at this point, but @shankari do you have any additional thoughts?

Abby-Wheelis commented 3 days ago

some pseudocode for my "local copy of base mode mapping" idea

mapping = {}
for mode in expanded_ct.mode_confirm.unique():
  mapping[mode] = await lookup_base_mode(mode)

which can then be used with .apply() to add the base_mode to the df quickly, and means we only await once per unique mode, and not once per row.

shankari commented 2 days ago

@iantei and @Abby-Wheelis I think we discussed this in an earlier team meeting. I think we should go with (1).

To address your points:

iterating over a list seems slow: as I pointed out, the data is stored in the database as trips, and is read as a list of trip JSON objects in the server code. We already iterate over the list using _to_data_df to create the dataframe. Please see emission/storage/timeseries/builtin_timeseries.py to understand how the interfaces work under the hood. And although apply is a dataframe method, it essentially iterates over the rows under the hood, it is not a highly efficient vectorized operation.
still need to convert to a dataframe for the plotting functions and ensure all the same filtering gets applied: I don't see this as a big win either way. Either you have to convert trips -> dataframe -> trips or trips -> dataframe. The second seems strictly better since there are fewer conversions
could use something like asyncio.gather() to speed up the iteration while applying the async footprint lookup: I don't see how this applies only to (2). you can perform operations on trips asynchronously as well

What am I missing here?

Abby-Wheelis commented 2 days ago

Either you have to convert trips -> dataframe -> trips or trips -> dataframe. The second seems strictly better since there are fewer conversions

Particularly for this reason and the other points you made I think it does make more sense to use the list, and after @iantei and I had poked through the server code together earlier this week, I think I see a relatively clear path to doing so. I'll move forward with implementing the data gathering piece while @iantei wraps up the other open PRs (#148, #145, #150), and then plan to pass it back off for the visualization piece!

e-mission / em-public-dashboard

Integrate e-mission-common's CO2 footprint and energy emission calculation into Public Dashboard #146