Closed shankari closed 1 year ago
Here are the results if people want to use them directly.
Also added some textual descriptions to make it easier to add text like XXX trips from YYY users to the text
Outputs for me are:
Total number of trips 135471 from 219 unique users
Number of trips with at least one label 57729 from 200 unique users
Trips without user specified labels 77742 from 219 users
Trips without user label but with inferred label 33550
Trips without user label or inferred label 44192
Number of trips in stage = 23821 from 70 unique users
Number of trips in real programs = 111650 from 149 unique users
Program specific counts: {'trips': {'4c': 10121, 'cc': 40612, 'fc': 19954, 'pc': 27337, 'sc': 10805, 'stage': 23821, 'vail': 2821}, 'unique_users': {'4c': 14, 'cc': 48, 'fc': 29, 'pc': 39, 'sc': 13, 'stage': 70, 'vail': 6}}```
Total number of trips 57729 from 200 unique users
Number of trips with at least one label 57729 from 200 unique users
Trips without user specified labels 0 from 0 users
Trips without user label but with inferred label 0
Trips without user label or inferred label 0
Number of trips in stage = 7334 from 54 unique users
Number of trips in real programs = 50395 from 146 unique users
Program specific counts: {'trips': {'4c': 4382, 'cc': 19546, 'fc': 7856, 'pc': 11253, 'sc': 5318, 'stage': 7334, 'vail': 2040}, 'unique_users': {'4c': 14, 'cc': 47, 'fc': 29, 'pc': 37, 'sc': 13, 'stage': 54, 'vail': 6}}
@hlu109 Expanded to support prepilot graphs as well for those who have included the prepilot data.
Same results, including prepilot are:
Total number of trips 138963 from 232 unique users
Number of trips with at least one label 60154 from 212 unique users
Trips without user specified labels 78809 from 232 users
Trips without user label but with inferred label 33550
Trips without user label or inferred label 45259
Number of trips in stage = 23821 from 70 unique users
Number of trips in real programs = 115142 from 162 unique users
Program specific counts: {'trips': {'4c': 10121, 'cc': 40612, 'fc': 19954, 'pc': 27337, 'prepilot': 3492, 'sc': 10805, 'stage': 23821, 'vail': 2821}, 'unique_users': {'4c': 14, 'cc': 48, 'fc': 29, 'pc': 39, 'prepilot': 13, 'sc': 13, 'stage': 70, 'vail': 6}}
Total number of trips 60154 from 212 unique users
Number of trips with at least one label 60154 from 212 unique users
Trips without user specified labels 0 from 0 users
Trips without user label but with inferred label 0
Trips without user label or inferred label 0
Number of trips in stage = 7334 from 54 unique users
Number of trips in real programs = 52820 from 158 unique users
Program specific counts: {'trips': {'4c': 4382, 'cc': 19546, 'fc': 7856, 'pc': 11253, 'prepilot': 2425, 'sc': 5318, 'stage': 7334, 'vail': 2040}, 'unique_users': {'4c': 14, 'cc': 47, 'fc': 29, 'pc': 37, 'prepilot': 12, 'sc': 13, 'stage': 54, 'vail': 6}}
@hlu109 Here's the summary of the full dataset
Total number of trips 241123 from 261 unique users
Number of trips with at least one label 92446 from 235 unique users
Trips without user specified labels 148677 from 260 users
Trips without user label but with inferred label 43404
Trips without user label or inferred label 105273
Number of trips in stage = 37302 from 78 unique users
Number of trips in real programs = 203821 from 183 unique users
Program specific counts: {'trips': {'4c': 14417, 'cc': 75184, 'fc': 32429, 'pc': 51182, 'prepilot': 3492, 'sc': 17984, 'stage': 37302, 'vail': 9133}, 'unique_users': {'4c': 15, 'cc': 52, 'fc': 30, 'pc': 39, 'prepilot': 13, 'sc': 22, 'stage': 78, 'vail': 12}}
Total number of trips 237631 from 248 unique users
Number of trips with at least one label 90021 from 223 unique users
Trips without user specified labels 147610 from 247 users
Trips without user label but with inferred label 43404
Trips without user label or inferred label 104206
Number of trips in stage = 37302 from 78 unique users
Number of trips in real programs = 200329 from 170 unique users
Program specific counts: {'trips': {'4c': 14417, 'cc': 75184, 'fc': 32429, 'pc': 51182, 'sc': 17984, 'stage': 37302, 'vail': 9133}, 'unique_users': {'4c': 15, 'cc': 52, 'fc': 30, 'pc': 39, 'sc': 22, 'stage': 78, 'vail': 12}}
I got a bit paranoid when I was running the analysis for the label assist, because we were at user 274 and hadn't finished yet, although I recorded only 261 users here. However, it turns out that we actually had 284 people actually sign up, although only 261 actually provided at least one trip.
Phew!
Name: user_email, Length: 284, dtype: object
@shankari Does our final run for the paper include or exclude minipilot?
In addition, are you able to get a count of how many users have fewer than 5 labeled trips (partially- or completely-labeled?) It would be helpful to include the number of users (and the total number of trips) that we ended up using in our 5-fold cross validation.
@hlu109 final run for the paper includes minipilot This is what I see in the classification notebook
148677 (0.62%) unlabeled, 92446 (0.38%) labeled, 241123 total trips
63/284 (0.22%) users have less than 5 labeled trips and cannot do cross-validation
Hopefully, many of the papers can reuse this to motivate the need for their work