Change from pie charts to 100% stacked bar chart

iantei commented 8 months ago

Changes to transform single pie charts into collective 100% Stacked Bar Charts.
Merge generic_metrics_sensed and mode_specific_metrics into generic_metrics notebook. Delete generic_metrics_sensed and mode_specific_metrics notebook, also from crontab.

shankari commented 6 months ago

Unable to generate
Bar chart of Number of trips for each purpose (selected by users).
Reason: Number of trips is 0. Participant_with_at_least_one_labeled_trip is 0. Participants_with_at_least_one_trip is 0. Registered_participants is 15. Trips_with_at_least_one_label is 0. Trips_with_mode_confirm_label is 0. Trips_with_trip_purpose_label is 0. month is 11. year is 2020.

After

        <html>
        <body>
        <h2>Unable to generate
Bar chart of Number of trips for each purpose (selected by users). Reason:</h2>

    <table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Number of trips</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Participant_with_at_least_one_labeled_trip</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Participants_with_at_least_one_trip</th>
      <td>0</td>
    </tr>
    <tr>
      <th>Registered_participants</th>
      <td>15</td>
    </tr>
...

Screenshot:

shankari commented 6 months ago

The solution to have the legend items side by side is to use nCols: https://stackoverflow.com/a/54870776

@Abby-Wheelis I'm thinking that we might want to keep the legend for the regular bar charts in the same place, but just increase the number of columns. For surveys, we may want to put the legend below the bar (to accommodate longer options) but still have only one column. We could specify that as an option to the plot function.

Thoughts?

shankari commented 6 months ago

Here's the option with 'lower right'

Here's the option with 'lower left' and the anchor at (1,0) which is not too bad

Here's the option with the ncols

Since this will likely change as we include survey results, and then the inferred results, I am going to punt on this. I will leave in the n_cols calculation, but switch to lower left and (1,0) to unblock us now.

shankari commented 6 months ago

We can't use pandas' barh because stacked doesn't see to work easily

>>> expanded_ct.groupby("Mode_confirm").agg({distance_col: 'count'}).reset_index().set_axis(["label", "value"], axis="columns").plot.barh(x="label", y="value", stacked=True)

generates

In lieu of investigating this further, let's just try to fix the gray value. This is displayed to avoid RuntimeError: Unknown return type

I tried to set the xlim to 100, because we know that it will never be more than that. However, 100, 100.01 and 100.1 and 101 all returned the same error. Leaving this for the pandas investigation...

shankari commented 6 months ago

One final fix, because the quality text annoys me and is confusing IMHO. Saying "For Labeled & Sensed: Based on X confirmed trips from Y users of X' trips from Y' users" is technically correct but requires cogitive load to determine which set of numbers goes with which bar, and what is labeled and what is confirmed. Going to try to fix that before declaring that this is done.

Abby-Wheelis commented 6 months ago

The quality text is updated in the inferred changes! That puts it on each bar, where it says something like "Confirmed trips, 100 trips from 10 users 15%" or similar like this: comment - I agree that this quality text is confusing, I like what ended up on the inferred trips more, I think we had decided to hold off on that change in this PR to limit the changes here - but that was a while ago when the timeline wasn't as crunched

shankari commented 6 months ago

I don't want to change the base quality_text methods since they are used in other plots as well. Let's hack together a method to pull out only the fields that we want for now. The correct fix would be to return the quality text as a tuple and format it in the notebook, which is also why one should not put presentation layer logic in the libraries.

Used some regular expression magic hacking to pull these out. I now have

While testing this with include_test_users=True I also found a regression in the scaffolding code.

def get_quality_text_sensed(df, cutoff_text="", include_test_users=False):

now has a cutoff_text argument. However, the invocation from scaffolding was still

    quality_text = get_quality_text_sensed(expanded_ct, include_test_users)

So the boolean True was interpreted as the cutoff_text and so the text that was generated was "Based on 2728 trips (True) from 13 users".

After fixing that, this option seems to work.

shankari commented 6 months ago

Made the appropriate changes to all the plots, although I kept some single-bar plots unchanged

Note that I had to remove the y-axis label "Trip Types" because: a) it didn't add anything beyond what the axis label already had b) it was overlapping with some of the label text, particular while including testers

shankari commented 6 months ago

Also does not crash for missing data

shankari commented 6 months ago

I am going to declare that this PR is done, at least to the extent that I am able to spend time on it at this time. Unless @iantei or @Abby-Wheelis has any objections, I plan to merge and push a release to staging

shankari commented 6 months ago

https://github.com/e-mission/em-public-dashboard/pull/128#issuecomment-2043336596 does look cool! We should revisit the quality text construction code when we implement inferred modes. I am not super happy with that code.

shankari commented 6 months ago

I have now tested with three different configurations:

NREL commute (study, default config)

USAID Laos (study, custom config)

Ride2own (program, default config)

All run without errors. @iantei has already tested the core code extensively before. Since I only restructured the code, I have tested that my changes generate the same results as his. I think this is ready to merge.

shankari commented 6 months ago

I have been debating a bit on whether to squash merge this or just merge it with a merge commit. Squash merging will have a cleaner merge history, but will also result in a signficantly more complex giant PR and will also lose some important back and forth around the discussions in the structure.

Let's go ahead and non-squash merge (we didn't squash merge for the UI rewrite either). But we should really be more careful about the plethora of commits in the future.

e-mission / em-public-dashboard

Change from pie charts to 100% stacked bar chart #123