e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 32 forks source link

Unify energy and emission calculations and make them more principled #954

Open Abby-Wheelis opened 10 months ago

Abby-Wheelis commented 10 months ago

Currently, there is an error thrown when setting the Carbon Dataset, though it does not seem to hault the function of the app. However, noting the error in the recent migration of Profile raised a broader issue:

If it is not a regression, I am fine with moving this to production. It is not super important, and we can fix it while fixing the dashboard. We should also file a more generic issue for it, because arguably we want to automatically figure out the CO2 emissions based on the location of the trip, instead of having the user choose it. The grid in WA state has a lot of hydropower, for example, while the grid in the Southern states uses more coal. Note also that this is not just location dependent, it is also time dependent. We are currently using values from 2020, so are not taking into account the impact of efforts to decarbonize the grid by using wind and solar.

@shankari left this feedback on #e-mission-phone#994, and I'm pulling it into a dedicated issue for additional consideration and attention during the upcoming dashboard migrations.

Abby-Wheelis commented 7 months ago

@shankari @JGreenlee @the-bay-kay and I met yesterday to start talking about ways in which we could re-vamp the calculations for energy and CO2 in both the phone dashboard and online dashboards. There are several main points which we considered:

  1. The need for a unified library to handle these calculations, which can then be shared between the phone and the public (online) dashboard
  2. The desire to balance user burden and highly tuned calculations
    • look for ways to detect transit energy intensity by location, rather than asking users to know/find out how the bus/train they ride is powered
    • if we do allow them to label their "transit power" - maybe use our detection as a backup or predictor (like our yellow labels)
    • If using an electric mode, use data to tune calculations according to grid makeup in a location. As mentioned above, there's a wide range of grid makeups as regions adopt renewables at differing rates
    • Potentially allow users to label along the lines of "I charged my car at home with my rooftop solar" vs "I used a public charger on my local grid"

We also discussed different units of measure related to energy (BTU vs kWH). Which, in terms of physics, all travel uses energy, but I'm guessing we're most interested in showing the non-human-powered energy use on the dashboards?

Next steps:

  1. looking for sources on transit energy / if there exists any kind of national database that has regional or city info on what proportion of public transit is electrified and/or decarbonized (@the-bay-kay)
  2. creating the 'common' repo and looking for sources for energy intensity information (@JGreenlee)
  3. looking for sources for region-specific grid makeup (@Abby-Wheelis)
Abby-Wheelis commented 7 months ago

I've begun looking into grid mixes, and found a good dashboard with relatively coarse regions(~25 in the US): epaDashboard , the source data from this could be one option for our location-based grid tuning efforts

the-bay-kay commented 7 months ago

While it does not have information on emissions, the National Transit Database NTD has a variety of data on a transit-agencies, sort-able by city. Each profile entry (ex) does contain a breakdown of the fleet - though it's pretty general in its vehicle categories.

I think we'd have to make several compromises using this database - we'd need generalized emission calculations for each vehicle type, and agency "coverage" may not be obvious when querying the database. For example, you couldn't tell if a rural town is covered by a neighboring city's metro (e.g., Boulder Creek is covered by the Santa Cruz metro). Furthermore, some cities have multiple metro agencies in one region, which may cause some issues.

I'm adding this to the thread as a fallback option, if we don't find another dataset. The NTD is thorough in the number of agencies and fleets covered, it's just missing some of the data we're looking for!

EDIT: The American Public Transit Agency (APTA) has a great public transit vehicle database! This seems to be a courser-grain study (150 agencies participating in US, 10 in Canada), but they have details on each vehicle's make and model. We would still need to find / generate emission numbers for each vehicle type, but this feels like a step in the right direction!

the-bay-kay commented 7 months ago

It seems the Federal Transit Association (FTA) does regular assessments of Greenhouse Gas (GHG) missions, based on fleets. If we want to use the transit databases mentioned above, Table 2-3 on this report may be of interest. It seems there are talks to do another programmatic assessment of GHG emissions in the next year or two (link). Comments on this docket item closed a few weeks ago, but one memo did point to this excel sheet used to calculate emission estimations, based on zone and vehicle category. I'll update this comment with more details - the hope is to find a version of these databases that is query-able by location / zone!

EDIT: This NDT database seems like a great resource for the amount and type of fuel consumed by various transit agencies. Unfortunately, there's no data on which modes of transportation were studied - hopefully I can find where the numbers came from here.

Abby-Wheelis commented 7 months ago

We also asked around within NREL, and were referred to the following resources:

Abby-Wheelis commented 5 months ago

From #1047's discussion, I ended up making this slide while I was waiting for a process to run to get my ideas on paper - throwing it here since we won't get to it until we tackle this! Screenshot 2024-02-06 at 4 39 27 PM

shankari commented 2 weeks ago

@nataliejschultz the last comment here: https://github.com/e-mission/e-mission-docs/issues/954#issuecomment-1841664884 has "EPA eGRID for domestic and IEA for intl" but there is also a WattTime (or similar) API that we have access to through the NREL library. We should get access to it, and investigate it to see how well it works.

Abby-Wheelis commented 1 week ago

From our discussion today -- handling hybrid vehicles:

As a bridge until we have the bandwidth to add custom vehicles, and as a default once we do, I think we can get an estimate of fuel-use-breakdown for a hybrid vehicles from GREET of which I've worked with in the work I've done with TEMPO and models vehicle efficiencies and emissions (both operational and full fuel pathways).

There are 2 cases to consider - HEVs and PHEVs

Going forward, we need to:

  1. decide if our default is HEV or PHEV, or if we are going to increase the list to include both
  2. for HEV, can treat as a highly efficient gas car
  3. for PHEV, can use MPGGE values from GREET, split between "electric" and "gas" modes with a default ratio, and combine electricity and gas emissions accordingly

@shankari @JGreenlee for visibility

nataliejschultz commented 1 week ago

WattTime Data API: takes latitude + longitude as an input and “returns the details of the grid region serving that location, if known”.

I was able to create an account with the API, and by “details of the grid region”, I think they just mean the name of the grid serving that location.

For example, I made a request using a lat/lon in Steamboat Springs, CO.

url = "https://api.watttime.org/v3/region-from-loc"
params = {"latitude": "40.486250027897185", "longitude": "-106.83258785805664", "signal_type": "co2_moer"}

This was the response I got:

{'region': 'PSCO', 'region_full_name': 'Public Service Co of Colorado', 'signal_type': 'co2_moer’}

I tried to get more information through a “signal index” request:

url = "https://api.watttime.org/v3/signal-index"
params = {
    "region": "PSCO",
    "signal_type": "co2_moer",
}

And got this response:


{'data': [{'point_time': '2024-06-26T21:35:00+00:00', 'value': 51.0}], 'meta': {'data_point_period_seconds': 300, 'region': 'PSCO', 'signal_type': 'co2_moer', 'units': 'percentile', 'warnings': [], 'model': {'date': '2022-10-01’}}}

We may be able to get slightly more detailed information with the paid version, but the library hasn’t gotten back to me about this yet.

They also have a SDK that might be useful if we decide to use it.

JGreenlee commented 1 week ago

That seems promising! From what I just read it seems like MOER is basically a measure of CO2 (in pounds) per MWh, which is certainly something we can work with.

I think we may want to use "historical" rather than "index" because that would allow us to pass start/end times as parameters. So we'd get a measure of MOER at the specific time of travel rather than the time at which the calculation is being performed.

shankari commented 1 week ago

I also think that we want co2_aoer instead of co2_moer. The moer value is the marginal value caused by a change in load or generation.

shankari commented 1 week ago

wrt transit mix, here's a data source. It is definitely not as cool as WattTime; there is no API to query and I don't know if we have historical numbers.

It also looks like it might be exposed via an API through a third-party API provider (https://dev.socrata.com/) However, it also looks like the most recent data is from 2022. It is also not clear how to automatically determine the transit agency for a particular (lat,lon) But at least it is an online source.

Maybe we should build an API like WattTime that makes integrating with the energy/carbon of transit modes easier!

JGreenlee commented 1 week ago

Although I don't think the NTD will have the bounds of each transit agency, I do see that there is a full list where each entry is registered to a particular city. We already get the name of cities from OSM / Overpass so theoretically we should be able to lookup the transit agency (or multiple transit agencies) for by city name. (In the case of multiple agencies in the same city, which we cannot disambiguate, maybe we take a weighted average?) https://www.transit.dot.gov/ntd/transit-agency-profiles

image

However, this data could use some cleaning... just on the first page I noticed "Middleotwn Ohio" (mispelled) and "Academy Lines" (duplicated as both an llc and inc)

JGreenlee commented 1 week ago

Much better than city names, I found that the agencies are also listed by UACE (Urban area codes, which I have learned are Census-designated and updated every 10 years) I didn't find any tools to lookup UACE by lat/lon, but I believe that Shapefiles from https://www.census.gov/cgi-bin/geo/shapefiles/index.php would contain the bounds. However, I don't know how to work with Shapefiles. If that is the approach we need to take, perhaps we can consult one of the GIS experts in our group!

JGreenlee commented 1 week ago

I found mapshaper.org as a quick way to view shapefiles and confirmed that those files from the census contain the specific boundaries for the Urban areas, and that the codes used there correspond to the codes of transit agencies in the NTD.

image image

Since the shapefile provides those boundaries, I think a lat/lon lookup would be fairly clean, and could provide us with a decent estimate for the mix of propulsion types used in an entire metro area. But, notice in my example that the Cincinnati urban area has many transit agencies, which we'd have to aggregate. If we want to disambiguate, perhaps that is the point at which we'd look at the specific city names (ie Hamilton, Batavia, which are in the greater Cincy metro area but not in Cincinnati city limits)

shankari commented 1 week ago

But, notice in my example that the Cincinnati urban area has many transit agencies, which we'd have to aggregate. If we want to disambiguate, perhaps that is the point at which we'd look at the specific city names (ie Hamilton, Batavia, which are in the greater Cincy metro area but not in Cincinnati city limits)

As a first pass, we could aggregate the mileage across all the transit agencies in the urban area. We could also try to validate a matching of the city name, but I know that it will fail in my local area, where the city is shown as "San Jose", although the transit agency covers all of Santa Clara county, including my town of Mountain View.

Screenshot 2024-06-28 at 11 39 05 PM

EDIT: My daughter also suggested that we could look at the bus stops along the route in OSM and see which agency they are run by. But we just spot checked, and the OSM says "VTA" instead of "Valley Transportation Authority". https://www.openstreetmap.org/node/6703651466 or https://www.openstreetmap.org/node/2161940025

JGreenlee commented 1 week ago

For every mode we want to calculate variables energy (in kWh) and carbon in (kg of CO2). I see 4 clusters of modes to consider:

a) gas fuel type (ICE or HEV) including gas cars, traditional hybrids, motorcycles, mopeds

  gallons = distance_miles / mpge
  # each gallon contains 33.7 kwH and emits 8.91 kg CO2
  energy = gallons * 33.7
  carbon = gallons * 8.91

b) electric fuel type (BEV) including e-cars, e-bikes, e-scooters

  energy = distance_miles / mpkwh
  aoer = watttime_lookup( ... ) # lbs/MWh
  carbon = energy * (aoer / 1000) * 0.453592

c) plug-in hybrids (PHEV)

  # perform calculations using a default assumed proportion of gas-powered miles / total miles
  # e.g. if it was 0.6, perform gas calculations with distance * 0.6; electric calculations with
  # distance * 0.4; then add those together

d) transit

  # get uace by coordinates
  # get split of gas, diesel, and electric miles across all transit agencies in the uace
  # perform calculations proportionally

some pseudocode

def calc_energy_and_carbon(distance, fuel_type):
  distance_miles = distance / 1609.34
  if fuel_type == 'electric':
    energy = distance_miles / mpkwh
    aoer = watttime_lookup( ... ) # lbs/MWh
    carbon = energy * (aoer / 1000) * 0.453592
    return [energy, carbon]
  gallons = distance_miles / mpge
  if fuel_type == 'gas':
    # each gallon contains 33.7 kwH and emits 8.89 kg CO2
    return [gallons * 33.7, gallons * 8.89]
  if fuel_type == 'diesel':
    # each gallon contains 38.1 kwH and emits 10.18 kg CO2
    return [gallons * 38.1, gallons * 10.18]
JGreenlee commented 1 week ago

Rich modes, which are provided by a label_options config (e.g. example-program-label-options, currently have the property kgCo2PerKm which specifies their carbon intensity. We need to decouple that from energy intensity.

Carbon intensity is basically a product of i) energy intensity, ii) the fuel type(s) of the mode, and iii) if applicable, the mix of the grid at the time&location of travel.

I think that to keep the platform as flexible as possible, (i) and (ii) should be properties of the rich mode, replacing kgCo2PerKm. As the unit for energy intensity, we could just use MPGe, but if we want to be less U.S- and car-centric, we can use 'Wh per km'. (The conversion is fairly straightforward assuming 1 gallon = 33.7 kWh which is what the EPA uses as the basis for MPGe.) Since some vehicles can have multiple fuel types (i.e PHEVs), values will be given for each fuel type that the mode has. mpge: { gasoline: 52, electric: 127 } -or- wh_per_km: { gasoline: 403, electric: 165 } EVs would only have 'electric'; ICE vehicles and HEVs (non-plug-in hybrids) would only have 'gasoline'. This would also give us an easy way to add other fuel types in the future like diesel or hydrogen cells.

Transit needs to be handled differently altogether. Since we will attempt to localize on a per-trip basis, we will not know MPGe or Wh/km ahead of time. I think these modes should have something like: transit_mode: 'MB' The possible values here would correspond to the modes in the NTD (assuming that's what we will use), e.g. 'MB' (fixed route bus), 'CB' (commuter bus), 'SR' (streetcar rail), 'DR' (demand response), 'DT' (demand response taxi), etc.

shankari commented 6 days ago

Couple of high-level points:

  1. If we are going to use the NTD anyway, I think we can do a better job of the other important part of the transit calculations - the load factor. I believe that every transit agency is supposed to report ridership numbers (even if only at an estimate) to the NTD as well. We should be able to use the ridership numbers to get an agency-wide estimate of the passenger load per mile as well (e.g. https://www.transit.dot.gov/ntd/data-product/monthly-module-adjusted-data-release) We can then get the per-person energy/carbon intensity using (vehicle miles * CO2/mile)/ passenger_load
  2. we need to make sure to structure this as a plugin so that we can:
    • use similar values specified by our partners in other countrie when they provide them, and
    • use defaults in case our partners are not able to specify values. We can continue to use current (non-NTD) values for the defaults although they are from the TEDB so are also US-specific. So this is really a very last resort.
nataliejschultz commented 5 days ago

We may be able to get slightly more detailed information with the paid version, but the library hasn’t gotten back to me about this yet.

Update:

The library got back to me, saying:

" I'm confirming that your WattTime API account has been provisioned with access to real-time, historical, and forecasted marginal emissions data. "

However, when I tried to make a historical call for aoer with these params:

params = {
    "region": "CAISO_NORTH",
    "signal_type": "co2_aoer",
    "start": "2021-07-15T00:00+00:00",
    "end": "2021-07-15T00:05+00:00",
    "include_imputed_marker": True,
}

I got an error message: requests.exceptions.HTTPError: 400 Client Error: Bad Request for url

Running the same request and changing the params to request moer works just fine:

{'data': [{'point_time': '2021-07-15T00:00:00+00:00', 'value': 1002.0}, {'point_time': '2021-07-15T00:05:00+00:00', 'value': 1003.0}], 'meta': {'data_point_period_seconds': 300, 'region': 'CAISO_NORTH', 'signal_type': 'co2_moer', 'units': 'lbs_co2_per_mwh', 'warnings': [], 'model': {'date': '2023-03-01', 'type': 'binned_regression'}}}
shankari commented 5 days ago

@nataliejschultz that's because they have given you access to the marginal data. that is moer. aeor is average data and maybe they can't given you access to it?