e-mission / em-public-dashboard

A simple and stupid public dashboard prototype.
BSD 3-Clause "New" or "Revised" License
1 stars 23 forks source link

🐛 Mode of Interest "quality text" has errors #115

Open Abby-Wheelis opened 7 months ago

Abby-Wheelis commented 7 months ago

I discovered this while testing my metric vs imperial changes, but have realized it's a bug in the whole dashboard, not just my branch, example of what I mean (from production): Screenshot 2024-02-01 at 9 57 38 AM

The mode_specific_timeseries in particular is creating this issue with the quality text. It stems from the fact that pandas groupby is now creating ALL combinations not just those with nonzero values, but then in scaffolding it counts the length of the dataframe rather than summing the trip counts ...

ex:

a dataset has 30 labeled trips from 5 users across 10 days and there are 3 possible modes

when we call groupby every possible combination in generated = 5103 = 150 even narrowing to 1 mode is still 50 trips, so when we compare 50 with 30 we're getting > 100%

One way to help this is to drop the day/user/mode combinations that have a trip count of 0, but I'm not sure that fixes the whole issue

A more principled fix would be to get the total mode of interest count by adding up rows rather than taking the length of the dataframe, this seems like it would be more accurate. I'm not sure how we got here, but it might have to do with pandas updates to the ways that groupby works, indicated by these warnings:

FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.

Abby-Wheelis commented 7 months ago

More about the code in mode-specific-timeseries, with the fix I've added so far:

mode_counts = data.groupby(['user_id','date_time','mode_confirm'], as_index=False).size()
mode_counts.rename(columns={'size':'trip_count'}, inplace=True)
#the scaffolding code counts the length! So even if 0 trips of that mode it was counting!
mode_counts = mode_counts[mode_counts['trip_count'] != 0]

in scaffolding.py:

def get_quality_text(before_df, after_df, mode_of_interest=None, include_test_users=False):
    """ Inputs:
    before_df = dataframe prior to filtering (usually participant_ct_df)
    after_df = dataframe after filtering (usually expanded_ct)
    mode_of_interest = optional detail to include in the text string
    """
    # CASE 1 of https://github.com/e-mission/em-public-dashboard/issues/69#issuecomment-1256835867
    after_pct = (len(after_df) * 100) / len(before_df) if len(before_df) != 0 else np.nan #compares lengths!
    cq = (len(after_df), unique_users(after_df), len(before_df), unique_users(before_df),
        after_pct, )
    interest_str = mode_of_interest + ' ' if mode_of_interest is not None else ''
    total_str = 'confirmed' if mode_of_interest is not None else ''
    user_str = 'testers and participants' if include_test_users else 'users'
    quality_text = f"Based on %s confirmed {interest_str}trips from %d {user_str}\nof %s total {total_str} trips from %d users (%.2f%%)" % cq
    print(quality_text)
    return quality_text

Actually, looking at these, maybe passing in what we do in mode_specific_timeseries is just the wrong move - what get_quality_text expects is a dataframe with all the trips (all the labeled trips in this case) and all the (labeled) trips of a given mode

Abby-Wheelis commented 7 months ago

quality_text = scaffolding.get_quality_text(expanded_ct, expanded_ct[expanded_ct['mode_confirm'] == mode_of_interest], mode_of_interest, include_test_users) is a call that gets the proper quality text as proven by the print statements below:

the length is the length of mode_counts_of_interest (mode_counts from the code above filtered to just the mode of interest, the sum is that of all the trip count values in that same dataframe. The increase is explained by the fact that users can take more than one walk trip in a day (and often will - if you walk somewhere you usually walk back)

Screenshot 2024-02-01 at 10 31 18 AM

I'll plan to roll this change into my ongoing PR and watch out for other places we could be having this problem!

shankari commented 7 months ago

@Abby-Wheelis Great catch! This is why we need better testing of the public dashboard, and a principled way to update pandas versions. We have found multiple such issues while upgrading the server code, for which we do indeed have proper tests.

@MukuFlash03 @nataliejschultz for visibiility