Usage of both Imperial system and Metric systems in displaying metric related information for configs which has label_options available

iantei commented 9 months ago

With the incorporation of Metric System to display timeseries of emission for charts in which config have label_options (Dynamic config) is available. This has led to case of confusion with few other metrics being displayed, which still shows the charts in Imperial System.

The emission timeseries charts like

"Timeseries of emissions per kilometer"
"Timeseries of emissions"

displays the distance/weight in Metric system (as the Laos config has dynamic config) i.e. kg/km

while the following other charts like:

"Trip count under 10 miles",
"Trip count under 10 miles (sensed) All data",
"Trip miles by mode All data Default",
"Trip miles by mode (sensed) All Data Default",
"Trip miles by land mode (sensed) All data Default",
"Average trip length All data Default"

still uses Imperial system (existing implementation), depicting distance as in miles

This could lead into confusion for the end user.

Possible solution: Incorporate usage of useImperial tag from the config file to determine whether to display all the information related with different metrics in Imperial Mode or Metric System.

Abby-Wheelis commented 8 months ago

I'm picking this up as my introduction to working with the dashboard itself and the interaction with the config (work until this point has been limited to notebooks used to generate charts for a paper, and mostly from TSDC data).

I had the chance to chat with @iantei this afternoon regarding this issue, who was kind enough to give me a bit of a crash course in dashboard development. We identified the display_config.use_imperial field of the config files as the best criteria for choosing which paradigm we'll use for the whole dashboard.

In looking at the code, I noted that the chart names (in the html files) are hard-coded to include the use of "miles" in their titles. My first thought was that I would need to create a second set of html files, but that seems less than ideal, so I will instead work towards a more dynamic solution, where labels are set based on some variables.

Miles are also ingrained in the way that charts are generated, so I will need to come up with some way to handle this more dynamically as well. If I can find an elegant way to do so, I will expand the current chart generation rather than introduce doubling (code for imperial charts and code for metric charts) but would resort to the latter if need be.

First step is to set up a branch and development environment to get a workflow going, and then I will move forward with making the necessary changes so that imperial vs metric is in line with what is dictated by the configuration file.

Abby-Wheelis commented 7 months ago

There are many charts which display trips "less than 10 miles" in coming up with an appropriate replacement, I'm considering showing trips "less than 15 kilometers" to get a round number, but a more direct translation would be 16.1 km - this would let me keep the math and just need to change the labels.

Abby-Wheelis commented 7 months ago

For each of the notebooks, I have been combing through and ensuring that any distances can be handled in either km or miles. For "cutoff" charts, such as "trips under 10 miles" I only changed the text, so the same metrics are displayed regardless of the units, just with labels that make sense in context. For other charts, I leveraged the addition of a distance_kms column to expanded_ct in order to calculate and display distance metrics in kilometers.

The images below represent the first 3 notebooks that I have updated, run with the snapshot of laos data (the other data I have loaded is CO ebike data, and it is extremely slow to work with it on my computer) for both imperial and metric "modes", as controlled by the variable use_imperial

Generic Metrics	Imperial	Metric

Generic Metrics Sensed	Imperial	Metric

Generic Timeseries	Imperial	Metric

These images indicate that both modes are consistent with each other, and only the units are different. Next steps:

[x] update the "program" notebooks, which contribute 7 more distance charts (this might require loading some different program data)
[x] implement changes on the front end (I have started this, but struggled to test)
[ ] test changes on the front end

Abby-Wheelis commented 7 months ago

There are many charts which display trips "less than 10 miles" in coming up with an appropriate replacement, I'm considering showing trips "less than 15 kilometers" to get a round number, but a more direct translation would be 16.1 km - this would let me keep the math and just need to change the labels.

I've been discussing this a bit offline with @iantei and wanted to bring the discussion into the issue. The concern with choosing 16.1 km is that it does cater to the imperial system, but that it also maintains a consistent threshold across all the dashboards - I think it might lose some meaning (and would absolutely lose some trips) if we decided to make the cutoff 10km.

I think we should either stick to 16.1 km = 10 miles, or, if we wanted to be able to present a round number to metric system users, select 15km as a middle ground. @shankari what do you think about the "cutoff" distance for metric system charts?

iantei commented 7 months ago

That's a good point. I just have a slightly different view that the end-user of Metric system based dashboard would be different to Imperial system dashboard (Like different program/study from non-US based research group who uses Metric system exclusively). Therefore having 16.1 km as the cut-off doesn't sound intuitive for the Metric system users on the public dashboard. But as @Abby-Wheelis mentioned earlier, if there is a research/study upon which we have selected 10 miles as the cut-off. We might as well stick with 16.1 km cut-off.

shankari commented 7 months ago

With the incorporation of Metric System to display timeseries of emission for charts in which config have label_options (Dynamic config) is available.

For the record, the imperial versus metric distinction has been around since the first iteration of the dynamic config in summer 2022.

I think we should either stick to 16.1 km = 10 miles, or, if we wanted to be able to present a round number to metric system users, select 15km as a middle ground. @shankari what do you think about the "cutoff" distance for metric system charts?

That's a good question. I agree that we are unlikely to have both miles and km displayed at the same time. The partners are going to pick one and the config only supports one. The 10 miles cutoff was fairly arbitrary, and a request from the CEO for the original statistics, based on statistics like this: https://www.energy.gov/eere/vehicles/articles/fotw-1230-march-21-2022-more-half-all-daily-trips-were-less-three-miles-2021

I don't even know why 10 instead of 3, although I can speculate that it is a round number that is at the limit of what an e-bike would travel so it would capture the travel of interest. Given that rationale, I vote for 15 km. Round number, at around the distance where the results would be meaningful.

I would not be opposed to changing/including 3 miles = 5km either to be consistent with the "half of trips" theme.

shankari commented 7 months ago

From some prior work that we did at the CanBikeCO work, here's the distribution by trip distance in the last NHTS. 10 miles covers ~ 80% of trips. An alternative method that is more data driven and not as hard-coded would be to display trips below the 80% distance. That would work for both miles and km and would be justifiable because it covers most of the trips (80:20 rule) while excluding outliers (I'm driving cross-country) trips. And it would work even in rural areas where the median distance might be longer.

Abby-Wheelis commented 7 months ago

An alternative method that is more data driven and not as hard-coded would be to display trips below the 80% distance.

I like the idea of <80% mark, that seems to be the most principled to me. I wonder how it could play in to different contexts, like if there was a study in a really densely populated area (or in a much less densely populated area). In any case, we would accomplish the goal of excluding the outlier / "cross country" trips. I'll plan on this approach, I think it helps work towards a plan which is both context dependent and maintains consistent meaning.

shankari commented 7 months ago

I wonder how it could play in to different contexts, like if there was a study in a really densely populated area (or in a much less densely populated area).

I actually think that is a bonus: "And it would work even in rural areas where the median distance might be longer." I would display what that 80% distance is (e.g. 9.5 miles). and potentially also display how we got that number, if you can figure out how to represent that correctly.

Abby-Wheelis commented 7 months ago

As it turns out, this Laos snapshot data is a good example of typically shorter trip lengths! FYI very easy to get the desired percentile of a column with pandas quantile -- if we compare to above, only ~60 trips labeled "motorcycle" were greater than 6.8 km but less than 16.1 km!

Abby-Wheelis commented 7 months ago

Still working off of the COebike data, which seems broken at this point (a lot of trips seem to be missing?), but since it is consistent, I can still use it to compare the way the notebooks handle imperial vs metric data

Energy Calculations	Imperial	Metric

1kg ~ 2.2lbs, which at a glance matches this chart, 1 -> 2.1 could be accounted for by rounding

Mode Specific Metrics	Imperial	Metric

1 mile ~ 1.6 km : 821.6 ~ 132 and 51.6 ~ 8, so this matches at a glance

Mode Specific Timeseries	Imperial	Metric

These charts are one of the main reasons I think that my data has something wrong with it... but the comparison between imperial and metric measurements are behaving as expected, with the charts being the same shapes, but having different y axis scales.

I've updated each of the notebooks now, so I'm going to move to working on the dashboard/frontend changes!

Abby-Wheelis commented 7 months ago

I've started with the html changes, which related to having the chart labels be adaptable, since many of them had hard-coded labels. I've updated all 4 html files now, and checked the lists by putting different configs in the url parameter, things look good so far, and I think I'm ready to move towards testing/debugging my changes.

There are 8 different scenarios that could happen on the dashboard now, from the combinations of:

program or study
dynamic or default labels
imperial or metric

Since my changes rely on the config, I need to check a few different scenarios. I currently have a snapshot of the laos-ev data, which is a study with dynamic labels configured to use imperial measurements. To keep coverage as high as possible, I think it would be helpful to also test with a dump of data from a program with default labels and metric measurements. stage-program fits this description. @shankari would is be possible for me to get a mongodump of a couple of months of stage-program data to test with? Or, now that I think about it, any program with default labels and e-bike as the mode of interest could be used against that config file to accomplish the same scenario.

Abby-Wheelis commented 7 months ago

Front - end testing (hosted locally):

Summary:

Config characteristics:	Metric	Imperial
Program	staging data, as expected, showing e-bike charts in default and dropdown, energy charts present	did not run data, but correct dropdown labels and default charts show
Study	staging data, as expected, study w/ energy charts, kilometers&kg	laos data, as expected study no energy charts, miles&lbs (used `usaid-laos-ev` so custom labels present)

For the two runs with the usaid-laos-ev study data, both showed the same charts (with different units) for all of the sensed charts, for user-labeled charts the absence of custom labels when I ran it against the stage-study config meant that there was some mismatch.

For the staging data, I ran it once against the program config, and then switched between stage-program and stage-study in the url parameter. Since the only real difference in the configs is the program/study distinction, I chose this route to save time in re-running all the notebooks, note the changes in what charts are displayed as a default. I also tested this data against the smart-commute-ebike config, since it was a program with imperial units, I didn't re-run the notebooks, so everything was still in kilometers, but the correct dropdown labels and default charts showed up. The broken emissions chart has now been fixed.

Screenshots:

Laos data: (laos config, stage-study config laosData_laosConfig_default lasoData_stageStudyConfig_default

Stage data: Screenshot 2024-01-29 at 12 39 44 PM Screenshot 2024-01-29 at 1 01 05 PM

Frontend changes (dropdown menus):

	Study	Program
metric, default
metric, custom		[Don't have one now, Laos will be]
imperial, default
imperial, custom

Abby-Wheelis commented 7 months ago

I ran the notebooks from the command line with the staging snapshot and the stage-program configuration one more time to iron out any issues, but I'm now experiencing a bug with the energy notebook in particular where it says that there is no Replaced Mode - I'm unsure how this bug popped up. Resolving it should be the last step before this PR is ready!

Abby-Wheelis commented 7 months ago

Getting into this issue more and finding a few things:

I'm seeing 0 e-bike trips and therefore 0 trips with replaced mode for the staging data snapshot that I have, but I was able to generate charts with e-bike data in them for the screenshots of the frontend testing I had done, I have no idea what I could have done wrong, I have cleared and restored the database a couple of times with the same results

@shankari do you happen to know for sure if there should be e-bike trips with replaced modes in the data you sent me?
And @iantei could you share any recommendations for deleting/restoring data so I can ensure I have a clean database?

in mode_specific_timeseries where we have data preprocessing to create mode_counts and mode_counts_of_interest there was an issue where it was generating all user-day-mode combinations, and therefore rows were created even if there were no trips that met that, which led to things like "600 walk trips of 300 confirmed trips" as the quality text, I believe I have resolved this bug

Abby-Wheelis commented 7 months ago

Program with e-bike data and replaced mode working on the dashboard!

I did see one error when running in the console with the energy-impacts notebook, but the charts still showed up as expected, and the console printed several rows of the expected output before the error, so my guess is that the final month in the set did not have any replaced mode labels? It's clear from the chart that e-bike trips dropped off over time, and the charts still appeared in the dashboard as expected, and the charts still matched what was in the notebooks.

Abby-Wheelis commented 6 months ago

Update from this being on staging:

now showing trips under 80th percentile instead of under 10 miles
set of charts available:
- excludes energy for custom label cases
- switches between miles and km based on use-imperial in the config
- includes mode-of-interest charts for programs
made sure that charts were different month-to-month

One issue was found and fixed #121

Abby-Wheelis commented 5 months ago

This change is now on production and working! @iantei I think we can close this issue as completed now

e-mission / em-public-dashboard

Usage of both Imperial system and Metric systems in displaying metric related information for configs which has label_options available #108