Change dashboard to support user inputs

shankari commented 2 years ago

Halting work on https://github.com/e-mission/e-mission-docs/issues/680 to make a higher priority change. Feedback from the CanBikeCO program admins was that the dashboard would be more impactful than the current polar bear gamification in motivating change.

However, in order to use the dashboard for CanBikeCo, we need to actually take the user labels into account. This should handle some of the issues reported with https://github.com/e-mission/e-mission-docs/issues/476, notably https://github.com/e-mission/e-mission-docs/issues/476#issuecomment-860470624 which should allow La Rochelle to reintroduce the user labeling and the common trips!! (@PatGendre)

shankari commented 2 years ago

Recap of existing design and future design considerations on the phone:

data = {}, set in setMetrics - should rename for clarity
metrics retrieved in pre-grouped buckets in getUserMetricsForServer (needs server side changes to return confirmed_trip buckets)
mapping to CO2 and calories
- mapping to CO2 in CarbonDatasetHelper, which has a bunch of metrics for different countries
- mapping to calories in CalorieCal, but basically all motorized modes are zero
how to deal with custom labels
- can put CO2 and calorie values into the trip confirm options
  
  Pro vs. Con
  - pros - obvious solution - ensures that we will always have values - cons - have to be careful about when to use it - if user isn't using labels (e.g. using survey or completely leaving it off), then we don't want to use it. - But how do we know whether the deployer is using labels or not. What if the deployer is using no user input or is using surveys? - presumably if they are using surveys, they don't care about the dashboard - if they don't have user input, we should fall back to the autodetected modes and the current carbon footprint calculation.
- can put CO2 and calorie values into a separate config file.
  - The thought is that this may help with figuring out whether the user is actually using the confirm options; the other CO2 config file could be missing if we wanted to use the auto-detected values.
  - But we would want to have defaults for it as well, right? So not sure we can assume that the CO2 config file will be missing if we are not using user inputs.
- we need to figure out a way to deal with "other" entries. Since we don't currently have a system for creating such entries, we will return them as "Other" or N/A and ignore them on the client.
- if we can detect whether the user is using user inputs and what kind it is, we can know whether to use the custom calculation or the default for the locale.
  1. we can try to determine this dynamically by seeing the buckets that we get from the server. If the buckets contain only standard modes, then we know that there must be no user inputs.
  2. we finally build the "config file" functionality that we have been talking about
- (a) seems easier to work with for now although we do want to/have to eventually bite the bullet for (b).

NOTE: If the deployer supports user inputs but the user hasn't labeled anything, we should make sure to return one N/A bucket from the server for this to work.

shankari commented 2 years ago

Server design considerations

Let's start with the server to see if the rest of this is feasible.

The metrics code currently is already configurable wrt the key that it reads for the analysis results.

    section_df = esda.get_data_df(eac.get_section_key_for_analysis_results(),

However, it assumes the retrieved entries are sections and that we need to use the sensed_mode field from the section for the grouping.

        mode_grouped_df = section_group_df.groupby('sensed_mode')

Since we are never going to call this new field sensed_mode, one obvious change is to make the grouping field also configurable.

But what kinds of entries should we retrieve and how should we specify the sensed mode?

One potential fix is to create a confirmed_section for each confirmed_trip that has the confirmed mode user input.
Another would be to continue consuming confirmed_trip, to expand the user inputs using expand_userinputs in emission//storage/decorations/trip_queries.py

(2) seems like an easier fix given that then we can kick the can of "how to create a confirmed section", down the road a bit more and not deal with it while making a high-priority fix

shankari commented 2 years ago

@PatGendre @jf87 any comments or thoughts on the design? I'm going to implement server changes first and the phone changes in the rest of the week.

PatGendre commented 2 years ago

@shankari this will be a great feature.

Actually in la Rochelle they are develop a separate "coachCO2" app hosted in the cozy cloud personal cloud service, and the e-mission "tracemob" is only a data collection tool (and possibly could be completed with other data collection tools such as train ticket sales, for instance). So the dashboard will be in "coachCO2" not in "tracemob", still the feature will be interested for tracemob as we could have other use cases than la Rochelle.
The coachCO2 app is currently being develop for coachCO2, one iteration every 6 weeks or so. What is intended is that the user could label the mode at the section level (and can also take into account labeled trips from e-mission), and the CO2/Calories metrics will be computed in the coachCO2 app. The code is here, https://github.com/cozy/coachCO2/blob/master/src/constants/const.js As the car model has a major influence on CO2 emission, it is intended to enable the user to parameter the car model in the app.
Also, it is intended to take into account the elevation if we manage to get the z data.
I write "intended" as the development is iterative and the priorities are not fixed in advance...

As for your questions,

on the server side, I agree that it is not vital to have confirmed_section to begin with, it should suffice to label the longest section of the trip with the user entered mode for the trip (although there will be some corner cases)
on the client side, one important question is whether the user can define custom modes or not (like car1, car2, e-scooter or whatever); it seems simpler to propose a pre-defined mode list, also because otherwise you'd have to define how to compute aggregates for several users ; if the app if really a "self data" app, custom modes would be nice to have, if the app is more a mobility survey tool, then the custom modes are not very appealing.
also, I did not understand completely your above discussion about the phone side, but I would also favour a behaviour where the metrics taken into account dynamically the user labels. I think defining this in a config file can be more complicated. So for a first implementation it should be as simple as possible.

shankari commented 2 years ago

@PatGendre both the features that you have outlined above:

"label at the section level" (which also implies trip editing) and
"select car mode"

would be very interesting to the core. Is there any hope of contributing them back?

I do raise the custom modes question above. For now, I am going to ignore them. But later, when we design an "other-handling" system, we should have a process for somebody to find the CO2 equivalent for new modes and enter them.

PatGendre commented 2 years ago

@shankari I am not sure it can be contributed back as the app is in react, not cordova (but I don't much about mobile dev ;-). The section level labelling will just add an attribute so won't be a true trip editing feature (which could also e.g. merge 2 sections into one, etc.). Anyway it will be open source and maybe it can nurture interesting discussions about functionalities?

I do raise the custom modes question above. For now, I am going to ignore them.

Ok, this is what I understood, and it seems reasonable! I don't know yet how custom modes will be handled in coachCO2 in future iterations but I can keep you informed of course.

shankari commented 2 years ago

@PatGendre for a cordova app, the UI is in javascript. Any javascript. Most of our UI is currently in Angular/Ionic, but we can interoperate with (basic) React components using ngReact. we experimented with this as part of the Itinerum integration and it looks like @kafitz got it to work https://github.com/e-mission/e-mission-docs/issues/643#issuecomment-903133273

So you/they could theoretically integrate the same codebase into the existing phone UI as well...

shankari commented 2 years ago

Server changes done! Remember to configure conf/analysis/debug.conf.json to set the analysis.result.section.key to analysis/confirmed_trip if you want to use this feature!

Phone changes next; ETA early next week.

jf87 commented 2 years ago

Sorry to join this discussion a bit late. I could not follow everything completely you wrote @shankari, but also in alignment with @PatGendre I would prefer a simple solution. I am happy to have a chat if it makes sense, just ping me :-)

shankari commented 2 years ago

Started work on the phone side; found that the median_speed metric doesn't work with confirmed trips. Fixing that first.

shankari commented 2 years ago

We need to generalize the CA 2030 and 2050 goals now that we are no longer focused on California. Ideally we would use a global number, but the NDC and population estimates globally are more complicated. Let's start with the US and move to global soon.

I'm recording the calculations here for the record. For these calculations, we will rely largely on the US NDC instead of following up on primary sources.

Current US goals, per "The Long Term Strategy of the United States" are:

50% reduction below 2005 levels by 2030. 2005 levels are ~ 2Gt (Figure 4). So the 2030 transportation sector goal is 1 Gt
100% net zero by 2050. The net zero goal relies on around 1Gt of negative emissions (page 6, figure ES-2). So the actual emissions target is 1Gt in 2050. The transportation sector accounts for 29% of current emissions (page 30). Assuming that all sectors decarbonize proportionally, the transportation sector emissions should be ~ 0.3 Gt in 2050.

US population estimates and projections are from the International Population Database (https://www.census.gov/programs-surveys/international-programs/about/idb.html). The "Tables" view is particularly useful to obtain the granular estimates that we need.

We estimate the per capita values using:

per capita kg CO2e/wk = (1,000,000,000 * 1000 * GTCO2e) / (population * 52)

Time	Yearly GT CO2e	Population	Per Capita kgCO2e/wk
Baseline (2005)	2	295,516,599	(1000000000 1000 2) / (295516599 * 52) = 130
Short-term goal (2030)	1	355,100,730	(1000000000 1000 1) / (355,100,730 * 52) = 54
Long-term goal (2050)	0.3	388,922,201	(1000000000 1000 0.3) / (388922201 * 52) = 14

shankari commented 2 years ago

Tried to push this to staging and ran into several problems:

The hack to find the median_speed https://github.com/e-mission/e-mission-server/pull/843 makes things Very Slow. In particular, if users select even a month, the aggregate call on staging times out. Currently commented out.
We still haven't fixed the code to ensure that we count trips that are not displayed because they are so good.
We need to handle optimal and max carbon footprint stuff

shankari commented 2 years ago

Feedback from Sandee: Change to "US goals"

shankari commented 2 years ago

"My footprint"
"Average for your group"
"How it compares"

Change colors: Green + Thumbs up if you meet the goal, Yellow + Sweating if you don't

shankari commented 2 years ago

Picking up where this left off...

shankari commented 2 years ago

The hack to find the median_speed https://github.com/e-mission/e-mission-server/pull/843 makes things very slow, since it has to make multiple database queries for each trip. One potential workaround is to use the mean speed instead, since we have distance and duration for both sections and trips.

Looking back at the calorie calculations https://github.com/e-mission/e-mission-docs/issues/139, we use METs from the https://sites.google.com/site/compendiumofphysicalactivities/home

The compendia for bicycling, for example, have values similar to "bicycling, 10-11.9 mph, leisure, slow, light effort". There is no indication of whether the speed is median or mean. Let's go with mean since that is easier from a code perspective.

For the record, another option is to compute the median_speed and store it as a field in both the section and the trip. However, we would not have that field for older trips and would still have to use the mean as a workaround. Since we don't have a specific need for median v/s mean, let's just go with mean.

shankari commented 2 years ago

Comparing limits:

for CA, I had 51 (2030 goal) and 8 (2050 goal). Note that I only considered passenger vehicles (cars and light duty trucks)
for CO, I had 45 (2030 goal) and 7 (2050 goal). I did not consider a subset of transportation emissions in this case.
for the US, I have 54 (2030 goal) and 14 (2050 goal). Maybe I should consider passenger vehicles + air only here as well?

If I want to, US DOT has some transportation trends (https://www.transportation.gov/sustainability/climate/transportation-ghg-emissions-and-trends) going back to 2005, but they seem to be consistent with each other. The emissions total, split by type of gas, is 2133 Tg = 2133 MMT (since "one million metric ton is equal to one teragram") = 2.133 Gt. But the emissions per sector are only 1.969 Gt. Adding up the individual entries by mode, we get 0.68 + 0.544 + 0.407 + 0.183 + 0.139 + 0.01 + 0.154 = 2.117. We should really pull out heavy duty trucks from this mix, since those emissions are incorporated into the products that we consume and not into our travel directly. But not sure how to pull that out, since the trends seem to lump heavy duty trucks (which are not relevant for passenger travel) along with buses (which are).

shankari commented 2 years ago

The color coding for the goals, which took so much time, seems to be still broken for non-default values.

Works for default values	Incorrectly shows all green if range is large	Incorrectly shows all red if range is small

shankari commented 2 years ago

This is because both the userCarbon and the us2050 values are strings.

$scope.carbonData.us2050 = Math.round(14 / 7 * days) + ' kg CO₂';
$scope.carbonData.userCarbon    = FootprintHelper.readableFormat(FootprintHelper.getFootprintForMetrics(userCarbonData));

Let's move the formatting in the HTML throughout the code, as per best practices.

shankari commented 2 years ago

Let's apply https://github.com/shankari/e-mission-phone/commit/a3518fee4d6f5f7243a604712c549eddfee2050f elsewhere as well and see if we can remove the code simplification that we had postponed in https://github.com/e-mission/e-mission-phone/pull/805/commits/6850bd49402c2a4d3cf506bb35aea95d5b973832

shankari commented 2 years ago

Let's think about all the places where we need formatted data:

Footprint card
Calorie card
Summary card
Charts

The first two are easy, so let's fix them first

Edit: fixed in https://github.com/e-mission/e-mission-phone/pull/805/commits/cc404654bafe5b7ce68ac6c179581c1b43615e3c and https://github.com/e-mission/e-mission-phone/pull/805/commits/44baf2214ed5dd743ae97c435dc47a0a493c7747

shankari commented 2 years ago

Making sure that we don't recompute values over and over again in different functions. Simplifying this should improve the performance.

shankari commented 2 years ago

The issue with the last two is that both need to be formatted, but the summary card wants one value per mode while the charts want multiple values per mode. So the format code needs to work on an array in one case and on individual values in the other.

There are a couple of design fixes for this:

pull out the formatters such that they work on one value at a time, which will also simplify the giant bolus of code that is the formatXXX/getSummaryData. We can then call the formatters as the inner value in the loop, or maybe even call it directly from the HTML
first format (to get the values for the chart) and then summarize. aka, call the method to summarize the data on the formatted values

The second option is also likely to run into issues with the list v/s list of lists, and the first option has the ability to both (a) simplify the code, and (b) support direct invocation from HTML. So let's go with option 1 to begin with.

shankari commented 2 years ago

Next, we disable the filtering queries because:

they don't actually work for the goals since we cannot figure out the number of days
they are confusing for users since most people who selected Nov 10 to Dec 11, would not expect only 1 day's worth of data
they make it so that we can't display the current range on the summary page

shankari commented 2 years ago

Finished re-enabling the "worst" calculation, which was fairly straightforward. Moving on to the "optimal" calculation, which is trickier.

The previous implementation of optimal was:

for all motorized mode trips, sum the distances for trips > 5km
- optimal for trips < 5km is to walk or bike, with footprint = 0
- find lowest footprint, multiply by distance sum

This is now complicated by the introduction of the pilot e-bikes.

The overall goal is now:

trips < 5km: walk or bike
trips < 20 miles = 32km: e-bike
trips > 32 km but < 600km = lowest footprint
trips > 600km = air

shankari commented 2 years ago

Note that we were already not handling air properly although we claimed in the calculations that we were. It seems like we need a couple of checks:

whether a mode is motorized or not (we can hardcode this for now, or check if the CO2 is 0)
what the cutoffs would be. The cutoffs could actually depend on the range of the specific "range-limited motorized vehicle" that the user had access to. Or we could say that the user could go out and get a "range-limited motorized vehicle". But what is an e-car then?

I can't think of a way to do this generically without getting significantly more information, potentially from the MITIE project. For now, we leave out the optimal calculation, or make it CanBikeCO specific.

Let's leave out the optimal calculation for now.

shankari commented 2 years ago

Although, the optimal calculation would really be useful to show people how low they can go.

Ok, here's an alternate proposal. We allow one "range-limited motorized vehicle", with the understanding that it would be an MITIE-class vehicle. If it exists, it should specify the max range. If it exists, we break at that range, otherwise we don't. This should work transparently for all evaluations of individual modes, but will break if there are multiple range-limited modes. But let's deal with that later.

shankari commented 2 years ago

Actually, this won't even work, and the optimal footprint was wrong all along. Basically, on the client, we have the summary (at a minimum, per day). We don't have individual trip values. So if we had 10 trips of 1km each, we would see car: 10k on the client. So those 1km car trips should really be replaced by walk/bike trips but we won't see it because car has a value > 5k. This was already a bug on the client, and it is only going to be a bigger bug going forward, where we have more divisions.

Sadly, abandoning the optimal calculation again because it requires way more refactoring than I want to do.

shankari commented 2 years ago

another issue with this approach is that the most efficient mode after the e-bike is scootershare. But using the scootershare intensity as the optimal for a range from 30k to 600k is clearly incorrect. But if we want to avoid that, we need to actually have additional metadata for each mode (is it long-range or range-limited, etc). This seems like a task for ... the long-term "other" mode server.

shankari commented 2 years ago

Having abandoned the optimal calculation for now, let's move on to the range calculations. Our goal here is to come up with a range of values for the footprint. The range comes from modes for which we don't have a mapping. The lower end assumes a mapping of 0 for the unknown modes. The higher end assumes the highest footprint (taxi) for the unknown modes.

shankari commented 2 years ago

Almost done, but need to figure out how the comparison code will work. If we have a range of values, which arrow do we display? How do we format the %?

I'd hoped that the signs on both low and high would be the same, but alas, a fairly early test indicated that they are not always. In this case, the low range was a potential decrease (17 to 17) while the high range was an increase (17 to 28). Might have to redo the entire HTML around the greater/lesser to show both options. There's going to be some super complex ng-if code...

Note also that

<div id="arrow-color" class="icon ion-arrow-up-a"></div>
<div id="arrow-color" class="icon ion-arrow-up-a"></div>

results in arrows one below the other.

shankari commented 2 years ago

The cases we need to handle are:

high == low
- low is +ve: single up arrow, single "greater" message
- low is -ve: single down arrow, single "less" message
high != low
- low is +ve, high is +ve, two up arrows, double "greater" message
- low is +ve, high is -ve, one up arrow, one down arrow, one "greater" message, one lesser message
- low is -ve, high is +ve, one down arrow, one up arrow, one "lesser" message, one "greater" message
- low is -ve, high is -ve, two down arrows, double "lesser" message

shankari commented 2 years ago

Implemented this as a separate directive, and it is not that bad, except for the "or" and "last week" bits in the final screen. They are in the next row somehow.

shankari commented 2 years ago

Range changes are complete (calorie was https://github.com/e-mission/e-mission-phone/pull/805/commits/980bbb4aca7ce265e7887f14de5b5a88bdfd88d6)

With this, all UI changes are complete. Pending potential issues:

optimal footprint is turned off
text for cookie/ice cream etc could be better
Or / vs. previous week text could be improved

Moving on to the final server changes (handling the trips with the expectation.to_label == false by including them in the analyses)

shankari commented 2 years ago

We're going to deal with the expectation.to_label issue by coming up with a new field called final_labels. The algorithm to fill in the final_labels is as follows:

if there is a user_input, final_labels = user_input, with confidence 100%
else if the expectation.to_label = false, final_labels = inferred_labels with the confidence of the inferred label
else final_labels is empty

We will then change the metrics and the leaderboard to use final_labels instead of user_input, including changing the code for expand_userinputs

shankari commented 2 years ago

Couple of notes:

the final_labels approach is the one that we will need to use when we merge user inputs, inferred labels and sensed labels. We can do an initial version of this to get started on that change, or we can say that we want to avoid over-engineering the solution prematurely and hold off until the final design.
If we want to hold off, we can:
- change expand_userinputs, or create a new expand_final_labels function that basically implements the algorithm above in real-time. This assumes that the performance hit won't be too significant, since the computation is fairly straightforward.
- this is a bit trickier for the leaderboard where we don't want to expand the user labels, only ensure that they exist, but again, the algorithm is fairly simple

Let's start with this approach, and then move on the new field if these don't work for some reason. It will be much harder (wrt backwards compat) to go and rewrite fields for all trips when we do the final design.

shankari commented 2 years ago

For the record, MET source for e-bikes is: https://journals.lww.com/acsm-tj/Fulltext/2021/04150/Metabolic_and_Cardiovascular_Responses_to_a.5.aspx?context=LatestArticles

Alessio, Helaine M., et al. "Metabolic and Cardiovascular Responses to a Simulated Commute on an E-Bike." Translational Journal of the American College of Sports Medicine 6.2 (2021): e000155.

shankari commented 2 years ago

Since we are NREL, we want to also support car vs. e-car. Estimates for the average e-car:

From Andrew Kotz: 3 - 4 miles/kWH = 333 - 250 WH/mile
From Andy's Leaf: 244 WH/mile
From my Leaf (4.2 miles/kWH average, probably because I am in the SF Bay Area): 238 WH/mile

Let's go with 250 WH/mile (0.25 kWH/VMT) since most of the numbers seem to be clustered around there, and 250 is a nice round number 😄

So we will add entries for:

e_car_drove_alone: 250 WH/mile (versus e-bike at 22 WH/mile)
e_car_shared_ride: 125 WH/mile (versus e-bike at 22 WH/mile)

conversions (based on methodology from minipilot paper)

We will continue to use 1166 lb per MWH, consistent with egrid estimate for CO at the time of the CEO mini-pilot https://www.epa.gov/egrid/power-profiler#/RMPA

1,000,000 WH = 1166 lb 250 WH = (250 1166) / 1000000 = 0.2915 lb 125 WH = (125 1166) / 1000000 = 0.1458 lb

e_car_drove_alone: 0.2915 lb/mile e_car_shared_ride: 0.1458 lb/mile

converting to SI units to be consistent with other values:

We want the values in kg/PkmT

0.2915 lb = 1 mile 0.1322 kg = 1.609 km (0.1322 / 1.609) = 0.08216 kg/PkmT (versus 0.00728 for e-bike, so the magnitude seems right)

0.1458 lb = 1 mile 0.0661 kg = 1.609 meters (0.0661 / 1.609) = 0.04108 kg/PkmT (half of drove_alone) so the magnitude seems right

e_car_drove_alone: 0.08216 kg/meter e_car_shared_ride: 0.04108 kg/meter

e-mission / e-mission-docs