e-mission / em-public-dashboard

A simple and stupid public dashboard prototype.
BSD 3-Clause "New" or "Revised" License
0 stars 23 forks source link

Add a simple notebook to get trip characteristics by program #47

Closed shankari closed 1 year ago

shankari commented 2 years ago

Hopefully, many of the papers can reuse this to motivate the need for their work

shankari commented 2 years ago

Here are the results if people want to use them directly.

image image image image image

shankari commented 2 years ago

Also added some textual descriptions to make it easier to add text like XXX trips from YYY users to the text

Outputs for me are:

Total number of trips 135471 from 219 unique users
Number of trips with at least one label 57729 from 200 unique users
Trips without user specified labels 77742 from 219 users
Trips without user label but with inferred label 33550
Trips without user label or inferred label 44192
Number of trips in stage = 23821 from 70 unique users
Number of trips in real programs = 111650 from 149 unique users
Program specific counts:  {'trips': {'4c': 10121, 'cc': 40612, 'fc': 19954, 'pc': 27337, 'sc': 10805, 'stage': 23821, 'vail': 2821}, 'unique_users': {'4c': 14, 'cc': 48, 'fc': 29, 'pc': 39, 'sc': 13, 'stage': 70, 'vail': 6}}```
Total number of trips 57729 from 200 unique users
Number of trips with at least one label 57729 from 200 unique users
Trips without user specified labels 0 from 0 users
Trips without user label but with inferred label 0
Trips without user label or inferred label 0
Number of trips in stage = 7334 from 54 unique users
Number of trips in real programs = 50395 from 146 unique users
Program specific counts:  {'trips': {'4c': 4382, 'cc': 19546, 'fc': 7856, 'pc': 11253, 'sc': 5318, 'stage': 7334, 'vail': 2040}, 'unique_users': {'4c': 14, 'cc': 47, 'fc': 29, 'pc': 37, 'sc': 13, 'stage': 54, 'vail': 6}}
shankari commented 2 years ago

@hlu109 Expanded to support prepilot graphs as well for those who have included the prepilot data.

Same results, including prepilot are:

image image image image image

Total number of trips 138963 from 232 unique users
Number of trips with at least one label 60154 from 212 unique users
Trips without user specified labels 78809 from 232 users
Trips without user label but with inferred label 33550
Trips without user label or inferred label 45259
Number of trips in stage = 23821 from 70 unique users
Number of trips in real programs = 115142 from 162 unique users
Program specific counts:  {'trips': {'4c': 10121, 'cc': 40612, 'fc': 19954, 'pc': 27337, 'prepilot': 3492, 'sc': 10805, 'stage': 23821, 'vail': 2821}, 'unique_users': {'4c': 14, 'cc': 48, 'fc': 29, 'pc': 39, 'prepilot': 13, 'sc': 13, 'stage': 70, 'vail': 6}}
Total number of trips 60154 from 212 unique users
Number of trips with at least one label 60154 from 212 unique users
Trips without user specified labels 0 from 0 users
Trips without user label but with inferred label 0
Trips without user label or inferred label 0
Number of trips in stage = 7334 from 54 unique users
Number of trips in real programs = 52820 from 158 unique users
Program specific counts:  {'trips': {'4c': 4382, 'cc': 19546, 'fc': 7856, 'pc': 11253, 'prepilot': 2425, 'sc': 5318, 'stage': 7334, 'vail': 2040}, 'unique_users': {'4c': 14, 'cc': 47, 'fc': 29, 'pc': 37, 'prepilot': 12, 'sc': 13, 'stage': 54, 'vail': 6}}
shankari commented 1 year ago

Results for the full dataset (with minipilot)

@hlu109 Here's the summary of the full dataset

With minipilot

image

image

image

image

image

Total number of trips 241123 from 261 unique users
Number of trips with at least one label 92446 from 235 unique users
Trips without user specified labels 148677 from 260 users
Trips without user label but with inferred label 43404
Trips without user label or inferred label 105273
Number of trips in stage = 37302 from 78 unique users
Number of trips in real programs = 203821 from 183 unique users
Program specific counts:  {'trips': {'4c': 14417, 'cc': 75184, 'fc': 32429, 'pc': 51182, 'prepilot': 3492, 'sc': 17984, 'stage': 37302, 'vail': 9133}, 'unique_users': {'4c': 15, 'cc': 52, 'fc': 30, 'pc': 39, 'prepilot': 13, 'sc': 22, 'stage': 78, 'vail': 12}}

Without minipilot

image

image

image

image

image

Total number of trips 237631 from 248 unique users
Number of trips with at least one label 90021 from 223 unique users
Trips without user specified labels 147610 from 247 users
Trips without user label but with inferred label 43404
Trips without user label or inferred label 104206
Number of trips in stage = 37302 from 78 unique users
Number of trips in real programs = 200329 from 170 unique users
Program specific counts:  {'trips': {'4c': 14417, 'cc': 75184, 'fc': 32429, 'pc': 51182, 'sc': 17984, 'stage': 37302, 'vail': 9133}, 'unique_users': {'4c': 15, 'cc': 52, 'fc': 30, 'pc': 39, 'sc': 22, 'stage': 78, 'vail': 12}}
shankari commented 1 year ago

I got a bit paranoid when I was running the analysis for the label assist, because we were at user 274 and hadn't finished yet, although I recorded only 261 users here. However, it turns out that we actually had 284 people actually sign up, although only 261 actually provided at least one trip.

Phew!

Name: user_email, Length: 284, dtype: object

hlu109 commented 1 year ago

@shankari Does our final run for the paper include or exclude minipilot?

In addition, are you able to get a count of how many users have fewer than 5 labeled trips (partially- or completely-labeled?) It would be helpful to include the number of users (and the total number of trips) that we ended up using in our 5-fold cross validation.

shankari commented 1 year ago

@hlu109 final run for the paper includes minipilot This is what I see in the classification notebook

148677 (0.62%) unlabeled, 92446 (0.38%) labeled, 241123 total trips
63/284 (0.22%) users have less than 5 labeled trips and cannot do cross-validation