Open shankari opened 2 years ago
Hm, the user does not appear to have any transitions for the past week
>>> start_ts = arrow.get("2022-02-01").timestamp
>>> end_ts = arrow.get("2022-02-09").timestamp
>>> transition_df = ts.get_data_df("statemachine/transition", time_query=estt.TimeQuery("data.ts", startTs=start_ts, endTs=end_ts))
Returns an empty dataframe.
Searching backwards, we find that the last transition was from 2021-12-08T17:08:52.017765-07:00
transition_df = ts.get_data_df("analysis/confirmed_trip", time_query=estt.TimeQuery("data.start_ts", startTs=start_ts, endTs=end_ts))
transition_df.tail()
shows us that the last trip is indeed from
2022-02-08T17:05:22.999877-07:00
Need to investigate why we stopped getting transitions and how our algorithm works when they are not present This is likely the root cause.
Focusing on trips from the 7th of Feb, we see a clear spike at around 7k
which persists while zooming in
There also appears to be an issue where the durations for that trip seem to be all over the map.
Doing an initial pass at classifying good vs. bad:
potential_bad_trips = feb_7_confirmed_trip_df[np.logical_and(feb_7_confirmed_trip_df.distance > 6500, feb_7_confirmed_trip_df.distance < 7500)]
potential_good_trips = feb_7_confirmed_trip_df[np.logical_or(feb_7_confirmed_trip_df.distance < 6500, feb_7_confirmed_trip_df.distance > 7500)]
And plotting the trips, they are indeed in a straight line across town (maps redacted for privacy reasons). Interestingly, while trying to plot the non resampled locations, it looks like there are none.
Found 0 features from 0 points
Found 0 features from 0 points
Found 0 features from 0 points
...
Found 0 features from 0 points
Found 0 features from 0 points
Found 0 features from 0 points
Checking to see if this is a characteristic of all potential bad trips and of any potential good trips.
There are no location points for the bad trips.
>>> pd.Series([len(ts.get_data_df("background/location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
for t in potential_bad_trips.to_dict(orient="records")]).unique()
array([0])
There are no location points for the good trips as well.
>>> pd.Series([len(ts.get_data_df("background/location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
for t in potential_good_trips.to_dict(orient="records")]).unique()
array([0])
There are apparently no location points for the entire month of Feb.
>>> ts.get_data_df("background/location", time_query=estt.TimeQuery("data.start_ts", startTs=start_ts, endTs=end_ts))
_
Last location point was from December as well 2021-12-08
. Wait - maybe we stopped storing the values after Dec because we hit the query limit.
It also turned out that we hadn't filtered for the 7th correctly. After fixing this, we now have:
>>> (len(potential_good_trips), len(potential_bad_trips))
(15, 27)
But every single trip seems to be a straight line, BUT they don't always have the same endpoints. The main difference between the "good" and "bad" trips seems to be that the endpoints sometimes double back.
But given that they are straight lines, the distance between the endpoints and the distance of the trip are likely to be the same. Let's see if that helps.
Ah, they are straight lines there are back. The actual O-D distance, even for the "bad trips" is very small
>>> potential_bad_trips[["distance", "od_distance"]]
distance | od_distance |
---|---|
7073.937531 | 1.853653e+01 |
6868.309384 | 1.127387e+01 |
7100.056180 | 1.170578e+01 |
6710.867451 | 1.207159e+01 |
7123.187586 | 1.223378e+01 |
7079.736764 | 2.990119e+01 |
6735.314398 | 3.331056e+00 |
7096.913321 | 3.156922e+00 |
7119.615165 | 5.456963e-01 |
Unfortunately, that means that we can't actually use the o-d distance, since this could happen legitimately for a round trip. But maybe for this user, for the immediate use case, it can be a good check?
Looking at the potential good trips, we have
zooming in on trips with OD below 3000, they are all the round trips
So if we categorize further:
>>> potential_bad_in_good = potential_good_trips[potential_good_trips.od_distance < 3000]
>>> potential_good_in_good = potential_good_trips[potential_good_trips.od_distance > 3000]
>>> len(potential_bad_trips), len(potential_bad_in_good), len(potential_good_in_good)
(27, 11, 4)
Visualizing those 4 trips, we get what appear to be one-way trips. But we can probably start with this for now and let the user mark the 4/(27+11+4) = 10% of bad one-way trips manually.
Let's see how many trips from the beginning of Feb would be affected.
start_ld = ecwld.LocalDate(year=2022, month=2, day=1)
end_ld = ecwld.LocalDate(year=2022, month=2, day=28)
all_jan_feb_confirmed_trip_df = ts.get_data_df("analysis/confirmed_trip", time_query=esttc.TimeComponentQuery("data.start_local_dt", start_ld, end_ld))
all_jan_feb_confirmed_trip_df["od_distance"] = all_jan_feb_confirmed_trip_df.apply(lambda r: ecc.calDistance(r.start_loc["coordinates"], r.end_loc["coordinates"], coordinates=False), axis=1)
all_feb_potential_bad_trips = all_jan_feb_confirmed_trip_df[all_jan_feb_confirmed_trip_df.od_distance < 100]
len(all_feb_potential_bad_trips), len(all_jan_feb_confirmed_trip_df)
Result: (58, 75)
Majority are from the 6th. 7th and 8th. One from the 1st. Scatter plot shows vertical lines at various distances.
Durations range from 1000 secs (1000/60 = 16 mins) to 4000 secs (4000/60 = 66 mins = 1 hour).
No clear signal in speeds either
to recap, at this point, we have a pretty good check (OD-distance < 100m). Any false negatives (trip was spurious but we didn't catch it), can be handled by the user, this would be a max of 17. Any false positives might be a problem, and we might want to come up with an additional check. This is likely to involve the actual location points.
Let's plot the trip from the first since it is most likely to be the false positive (if one exists). The first is a false positive.
Checking the other fields, it is a lot more than 7k in distance. Let's plot the other trips with > 7k in distance and see if they are spurious.
So there are 8 trips > 7k in distance
index | start_local_dt_month | start_local_dt_day | end_local_dt_month | end_local_dt_day | duration | distance | od_distance | mean_speed |
---|---|---|---|---|---|---|---|---|
3 | 2 | 1 | 2 | 1 | 2308.166839 | 14184.026960 | 1.678697e+01 | 6.145148 |
11 | 2 | 6 | 2 | 6 | 1481.346913 | 14066.098733 | 5.955038e-03 | 9.495479 |
35 | 2 | 7 | 2 | 7 | 1571.072589 | 13915.663279 | 2.087384e-01 | 8.857429 |
37 | 2 | 7 | 2 | 7 | 1552.858794 | 12220.480719 | 4.796685e-01 | 7.869666 |
44 | 2 | 7 | 2 | 7 | 2387.434461 | 20542.246691 | 7.900875e-10 | 8.604319 |
47 | 2 | 7 | 2 | 7 | 1806.244204 | 13965.648219 | 4.733766e-01 | 7.731872 |
51 | 2 | 7 | 2 | 7 | 3133.604467 | 14063.391410 | 7.117096e+00 | 4.487928 |
71 | 2 | 8 | 2 | 8 | 4443.580903 | 31850.950049 | 2.713432e+00 | 7.167856 |
On mapping them, the first and last entries (3 and 71) are valid round trips. The others are not.
Plotting the various trip level metrics, we don't see a clear separation between valid and invalid.
Re-exported data for only the year 2022.
We now see transitions, and all the transitions for the 7th seem to be visit only, without a corresponding geofence exit. That might be potential discriminant.
fmt_time | transition_name | state_name |
---|---|---|
2022-02-07T00:24:46.908138-07:00 | TransitionType.NOP | State.WAITING_FOR_TRIP_START |
2022-02-07T00:24:46.916240-07:00 | TransitionType.VISIT_ENDED | State.WAITING_FOR_TRIP_START |
2022-02-07T00:35:28.095417-07:00 | TransitionType.VISIT_STARTED | State.ONGOING_TRIP |
-- | -- | -- |
2022-02-07T00:37:06.316291-07:00 | TransitionType.VISIT_ENDED | State.WAITING_FOR_TRIP_START |
2022-02-07T00:37:08.928151-07:00 | TransitionType.VISIT_STARTED | State.ONGOING_TRIP |
-- | -- | -- |
2022-02-07T00:50:28.613245-07:00 | TransitionType.VISIT_ENDED | State.WAITING_FOR_TRIP_START |
2022-02-07T00:50:29.307538-07:00 | TransitionType.VISIT_STARTED | State.ONGOING_TRIP |
-- | -- | -- |
2022-02-07T01:03:43.421707-07:00 | TransitionType.VISIT_ENDED | State.WAITING_FOR_TRIP_START |
2022-02-07T01:04:02.212470-07:00 | TransitionType.VISIT_STARTED | State.ONGOING_TRIP |
-- | -- | -- |
2022-02-07T01:28:44.901669-07:00 | TransitionType.VISIT_ENDED | State.WAITING_FOR_TRIP_START |
2022-02-07T01:29:51.414304-07:00 | TransitionType.VISIT_STARTED | State.ONGOING_TRIP |
-- | -- | -- |
Re-running the rest of the analysis, we now have 79 trips, so it looks like the issue resolved itself after the 8th?
start_ld = ecwld.LocalDate(year=2022, month=2, day=1)
end_ld = ecwld.LocalDate(year=2022, month=2, day=28)
all_jan_feb_confirmed_trip_df = ts.get_data_df("analysis/confirmed_trip", time_query=esttc.TimeComponentQuery("data.start_local_dt", start_ld, end_ld))
all_jan_feb_confirmed_trip_df["od_distance"] = all_jan_feb_confirmed_trip_df.apply(lambda r: ecc.calDistance(r.start_loc["coordinates"], r.end_loc["coordinates"], coordinates=False), axis=1)
all_feb_potential_bad_trips = all_jan_feb_confirmed_trip_df[all_jan_feb_confirmed_trip_df.od_distance < 100]
len(all_feb_potential_bad_trips), len(all_jan_feb_confirmed_trip_df)
Result: (58, 79)
start_local_dt_day | end_local_dt_day |
---|---|
8 | 8 |
8 | 8 |
8 | 8 |
8 | 8 |
8 | 8 |
8 | 8 |
11 | 11 |
12 | 12 |
12 | 12 |
13 | 13 |
Looking at these last four trips, one has a clearly defined trajectory. The others are little groups of points, similar to some trips on the 8th.
But the number of locations seems like a potential discriminator.
pd.Series([len(ts.get_data_df("background/location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
for t in all_feb_potential_bad_trips.to_dict(orient="records")]).unique()
Result: array([2227, 32, 24, 21, 41, 22, 15, 16, 55, 19, 12,
20, 14, 34, 23, 28, 44, 67, 39, 69, 25, 35,
49, 68, 38, 46, 75, 4032])
pd.Series([len(ts.get_data_df("background/filtered_location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
for t in all_feb_potential_bad_trips.to_dict(orient="records")]).unique()
Result: array([2227, 15, 7, 25, 14, 8, 31, 5, 11, 9, 13,
20, 6, 38, 39, 28, 27, 22, 16, 37, 34, 4032])
Still have the same potentially bad trips that are actually good. Plotting this, we get
pd.Series([len(ts.get_data_df("background/filtered_location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
for t in all_feb_potential_bad_trips_actually_good.to_dict(orient="records")]).plot(kind="bar")
So it looks like that will work!
Double checking by mapping some known bad trips from the morning of the 7th
The last few trips on the 8th+ have one trip that looks like that, and others that just look like a cluster of points at the destination.
So the big gap/sparse points seems like a good check, at least for this user at this time. Need to think about whether we want to incorporate it into the regular pipeline.
Double checking...
potential_bad_trips["n_locations"] = potential_bad_trips.apply(lambda t: len(ts.get_data_df("background/filtered_location",
time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t)))), axis=1)
potential_bad_trips.distance/potential_bad_trips.n_locations
1 1010.562504
2 490.593527
3 1014.293740
4 447.391163
5 890.398448
7 1415.947353
8 612.301309
9 788.545925
10 889.951896
11 532.434733
13 468.722233
14 890.146060
15 525.736338
16 1182.026649
17 507.363236
18 1153.445392
19 473.915800
20 996.316469
21 529.151662
24 879.441002
26 1013.337456
28 985.787600
30 550.428395
31 1011.751036
34 989.617100
36 1009.464400
37 460.970361
dtype: float64
And for the mixed dataset
all_feb_potential_bad_trips_actually_good["n_locations"] = all_feb_potential_bad_trips_actually_good.apply(lambda t: len(ts.get_data_df("background/filtered_location",
time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t)))), axis=1)
all_feb_potential_bad_trips_actually_good.distance / all_feb_potential_bad_trips_actually_good.n_locations
3 6.369119
11 562.643949
35 993.975949
37 321.591598
44 760.823952
47 634.802192
51 878.961963
71 7.899541
dtype: float64
Note that our filter distance is supposed to be 1 meter. https://github.com/e-mission/e-mission-data-collection/blob/master/src/ios/Wrapper/LocationTrackingConfig.m#L26
{'is_duty_cycling': True, 'filter_distance': 1, 'simulate_user_interaction': False, 'accuracy_threshold': 200, 'filter_time': -1, 'geofence_radius': 100, 'ios_use_visit_notifications_for_detection': True, 'ios_use_remote_push_for_sync': True, 'accuracy': 100, 'trip_end_stationary_mins': 10, 'android_geofence_responsiveness': -1}
So a possible threshold could be 100x that, so a density of > 100m
To summarize, our check for "invalid trip" is:
od_distance < 100
and notdistance > 7500
andmean_speed < 10
anddistance/n_locations > 100
)Let's see how many of these show up for this user overall
all_confirmed_trip_df = ts.get_data_df("analysis/confirmed_trip")
all_confirmed_trip_df["od_distance"] = all_confirmed_trip_df.apply(lambda r: ecc.calDistance(r.start_loc["coordinates"], r.end_loc["coordinates"], coordinates=False), axis=1)
all_confirmed_trip_df["mean_speed"] = all_confirmed_trip_df.distance / all_confirmed_trip_df.duration
first_three_checks_overall = all_confirmed_trip_df.query("od_distance < 100 and not (distance > 7500 and mean_speed < 10)")
len(first_three_checks_overall), len(all_confirmed_trip_df), (len(first_three_checks_overall)/len(all_confirmed_trip_df))
first_three_checks_overall[["start_local_dt_month", "start_local_dt_day", "end_local_dt_month", "end_local_dt_day", "od_distance", "distance", "mean_speed"]]
start_local_dt_month | start_local_dt_day | end_local_dt_month | end_local_dt_day | od_distance | distance | mean_speed |
---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 6.986333e+01 | 156.678443 | 0.327779 |
1 | 1 | 1 | 1 | 1.740171e+01 | 1084.129455 | 0.798355 |
1 | 5 | 1 | 5 | 8.995684e+01 | 363.770437 | 0.478790 |
1 | 5 | 1 | 5 | 4.107239e-02 | 1990.706388 | 1.316896 |
1 | 5 | 1 | 5 | 5.988431e+01 | 1132.902350 | 3.496865 |
1 | 18 | 1 | 18 | 6.652086e+01 | 1597.639593 | 2.206089 |
2 | 6 | 2 | 6 | 1.850196e-01 | 7103.825879 | 1.944089 |
2 | 6 | 2 | 6 | 2.342991e+00 | 7039.310841 | 1.698978 |
2 | 6 | 2 | 6 | 0.000000e+00 | 7112.693053 | 1.702703 |
2 | 7 | 2 | 7 | 1.853653e+01 | 7073.937531 | 11.576851 |
2 | 7 | 2 | 7 | 1.127387e+01 | 6868.309384 | 27.608938 |
2 | 7 | 2 | 7 | 1.170578e+01 | 7100.056180 | 5.205115 |
2 | 7 | 2 | 7 | 1.207159e+01 | 6710.867451 | 24.672423 |
2 | 7 | 2 | 7 | 1.223378e+01 | 7123.187586 | 12.798619 |
2 | 7 | 2 | 7 | 3.101800e+01 | 20965.968319 | 15.490884 |
2 | 7 | 2 | 7 | 2.990119e+01 | 7079.736764 | 11.872148 |
2 | 7 | 2 | 7 | 3.331056e+00 | 6735.314398 | 17.452288 |
2 | 7 | 2 | 7 | 3.156922e+00 | 7096.913321 | 11.676411 |
2 | 7 | 2 | 7 | 5.456963e-01 | 7119.615165 | 4.834291 |
2 | 7 | 2 | 7 | 1.297290e+01 | 6921.651532 | 27.754705 |
2 | 7 | 2 | 7 | 1.340543e+01 | 13834.872888 | 12.779537 |
2 | 7 | 2 | 7 | 8.192925e+00 | 7030.833489 | 10.756582 |
2 | 7 | 2 | 7 | 9.255376e+00 | 7121.168479 | 5.306681 |
2 | 7 | 2 | 7 | 9.398279e+00 | 6834.572396 | 18.271643 |
2 | 7 | 2 | 7 | 1.310545e+00 | 7092.159891 | 5.376771 |
2 | 7 | 2 | 7 | 6.030290e+00 | 7103.085299 | 17.716728 |
2 | 7 | 2 | 7 | 1.227530e+01 | 6920.672352 | 12.350568 |
2 | 7 | 2 | 7 | 5.145222e-01 | 7108.737007 | 8.657235 |
2 | 7 | 2 | 7 | 1.862717e+00 | 6974.215285 | 17.912286 |
2 | 7 | 2 | 7 | 8.986348e+00 | 6878.971601 | 18.440336 |
2 | 7 | 2 | 7 | 2.661333e-09 | 7035.528018 | 11.337731 |
2 | 7 | 2 | 7 | 4.796685e-01 | 7093.362191 | 2.808319 |
2 | 7 | 2 | 7 | 1.556720e+00 | 21974.470798 | 15.933950 |
2 | 7 | 2 | 7 | 1.363708e+00 | 6900.513198 | 30.312781 |
2 | 7 | 2 | 7 | 6.431294e-01 | 13970.753169 | 11.931122 |
2 | 7 | 2 | 7 | 8.361405e-01 | 7155.569139 | 4.667499 |
2 | 7 | 2 | 7 | 9.374204e-07 | 7082.257252 | 11.380397 |
2 | 7 | 2 | 7 | 8.591278e-01 | 6312.254022 | 13.618016 |
2 | 7 | 2 | 7 | 1.332504e+00 | 6927.319698 | 6.867789 |
2 | 7 | 2 | 7 | 1.445264e-09 | 7066.250800 | 4.685370 |
2 | 7 | 2 | 7 | 2.455396e+01 | 6914.555419 | 39.392666 |
2 | 7 | 2 | 7 | 8.173925e+00 | 21090.861869 | 13.985404 |
2 | 8 | 2 | 8 | 2.209519e-01 | 7153.275185 | 15.427448 |
2 | 8 | 2 | 8 | 3.200963e+00 | 6740.749776 | 14.768179 |
2 | 8 | 2 | 8 | 8.386498e+00 | 6807.663936 | 16.221843 |
2 | 8 | 2 | 8 | 1.137082e+01 | 7067.945941 | 1.764125 |
2 | 8 | 2 | 8 | 1.990353e-09 | 7089.220503 | 1.893732 |
2 | 8 | 2 | 8 | 2.814614e+01 | 7107.139223 | 12.469921 |
2 | 8 | 2 | 8 | 2.996777e+01 | 7136.545569 | 4.744430 |
2 | 8 | 2 | 8 | 1.526732e+01 | 7088.501464 | 19.577850 |
2 | 8 | 2 | 8 | 1.440534e+01 | 7028.174815 | 4.781001 |
2 | 8 | 2 | 8 | 5.645250e-01 | 14152.376098 | 10.298519 |
2 | 8 | 2 | 8 | 6.583064e-08 | 20947.048458 | 11.481887 |
2 | 8 | 2 | 8 | 7.913714e-08 | 7017.704820 | 8.595545 |
2 | 8 | 2 | 8 | 1.583436e-04 | 19993.302120 | 17.437462 |
2 | 8 | 2 | 8 | 3.959867e-04 | 12421.697166 | 13.026074 |
Recomputing in a different way, we get the same result:
first_check_overall = all_confirmed_trip_df.query("od_distance < 100")
next_two_checks_good = first_check_overall.query("distance > 7500 and mean_speed < 10")
next_two_checks_good[["start_local_dt_month", "start_local_dt_day", "end_local_dt_month", "end_local_dt_day", "od_distance", "distance", "mean_speed"]]
start_local_dt_month | start_local_dt_day | end_local_dt_month | end_local_dt_day | od_distance | distance | mean_speed |
---|---|---|---|---|---|---|
1 | 4 | 1 | 4 | 5.124098e+01 | 11516.539891 | 8.191983 |
2 | 1 | 2 | 1 | 1.678697e+01 | 14184.026960 | 6.145148 |
2 | 6 | 2 | 6 | 5.955038e-03 | 14066.098733 | 9.495479 |
2 | 7 | 2 | 7 | 2.087384e-01 | 13915.663279 | 8.857429 |
2 | 7 | 2 | 7 | 4.796685e-01 | 12220.480719 | 7.869666 |
2 | 7 | 2 | 7 | 7.900875e-10 | 20542.246691 | 8.604319 |
2 | 7 | 2 | 7 | 4.733766e-01 | 13965.648219 | 7.731872 |
2 | 7 | 2 | 7 | 7.117096e+00 | 14063.391410 | 4.487928 |
2 | 8 | 2 | 8 | 2.713432e+00 | 31850.950049 | 7.167856 |
first_three_checks_overall = first_check_overall[np.logical_not(np.logical_and(first_check_overall.distance > 7500, first_check_overall.mean_speed < 10))]
len(first_check_overall), len(next_two_checks_good), len(first_three_checks_overall), len(all_confirmed_trip_df), (len(first_three_checks_overall)/len(all_confirmed_trip_df))
(65, 9, 56, 140, 0.4)
Visualizing the maps before 6th Feb, we get a bunch of valid trips. We need to add the density check as well.
After adding the density check, it looks good.
first_three_checks_overall["n_locations"] = first_three_checks_overall.apply(lambda t: len(ts.get_data_df("background/filtered_location",
time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t)))), axis=1)
first_three_checks_overall["loc_density"] = first_three_checks_overall.distance / first_three_checks_overall.n_locations
all_four_checks = first_three_checks_overall[first_three_checks_overall.loc_density > 100]
len(first_check_overall), len(next_two_checks_good), len(first_three_checks_overall), len(all_four_checks), len(all_confirmed_trip_df), (len(first_three_checks_overall)/len(all_confirmed_trip_df)), (len(all_four_checks)/len(all_confirmed_trip_df))
Result: (65, 9, 56, 50, 140, 0.4, 0.35714285714285715)
start_local_dt_month | start_local_dt_day | end_local_dt_month | end_local_dt_day | od_distance | distance | mean_speed |
---|---|---|---|---|---|---|
2 | 6 | 2 | 6 | 1.850196e-01 | 7103.825879 | 1.944089 |
2 | 6 | 2 | 6 | 2.342991e+00 | 7039.310841 | 1.698978 |
2 | 6 | 2 | 6 | 0.000000e+00 | 7112.693053 | 1.702703 |
2 | 7 | 2 | 7 | 1.853653e+01 | 7073.937531 | 11.576851 |
2 | 7 | 2 | 7 | 1.127387e+01 | 6868.309384 | 27.608938 |
2 | 7 | 2 | 7 | 1.170578e+01 | 7100.056180 | 5.205115 |
2 | 7 | 2 | 7 | 1.207159e+01 | 6710.867451 | 24.672423 |
2 | 7 | 2 | 7 | 1.223378e+01 | 7123.187586 | 12.798619 |
2 | 7 | 2 | 7 | 3.101800e+01 | 20965.968319 | 15.490884 |
2 | 7 | 2 | 7 | 2.990119e+01 | 7079.736764 | 11.872148 |
2 | 7 | 2 | 7 | 3.331056e+00 | 6735.314398 | 17.452288 |
2 | 7 | 2 | 7 | 3.156922e+00 | 7096.913321 | 11.676411 |
2 | 7 | 2 | 7 | 5.456963e-01 | 7119.615165 | 4.834291 |
2 | 7 | 2 | 7 | 1.297290e+01 | 6921.651532 | 27.754705 |
2 | 7 | 2 | 7 | 1.340543e+01 | 13834.872888 | 12.779537 |
2 | 7 | 2 | 7 | 8.192925e+00 | 7030.833489 | 10.756582 |
2 | 7 | 2 | 7 | 9.255376e+00 | 7121.168479 | 5.306681 |
2 | 7 | 2 | 7 | 9.398279e+00 | 6834.572396 | 18.271643 |
2 | 7 | 2 | 7 | 1.310545e+00 | 7092.159891 | 5.376771 |
2 | 7 | 2 | 7 | 6.030290e+00 | 7103.085299 | 17.716728 |
2 | 7 | 2 | 7 | 1.227530e+01 | 6920.672352 | 12.350568 |
2 | 7 | 2 | 7 | 5.145222e-01 | 7108.737007 | 8.657235 |
2 | 7 | 2 | 7 | 1.862717e+00 | 6974.215285 | 17.912286 |
2 | 7 | 2 | 7 | 8.986348e+00 | 6878.971601 | 18.440336 |
2 | 7 | 2 | 7 | 2.661333e-09 | 7035.528018 | 11.337731 |
2 | 7 | 2 | 7 | 4.796685e-01 | 7093.362191 | 2.808319 |
2 | 7 | 2 | 7 | 1.556720e+00 | 21974.470798 | 15.933950 |
2 | 7 | 2 | 7 | 1.363708e+00 | 6900.513198 | 30.312781 |
2 | 7 | 2 | 7 | 6.431294e-01 | 13970.753169 | 11.931122 |
2 | 7 | 2 | 7 | 8.361405e-01 | 7155.569139 | 4.667499 |
2 | 7 | 2 | 7 | 9.374204e-07 | 7082.257252 | 11.380397 |
2 | 7 | 2 | 7 | 8.591278e-01 | 6312.254022 | 13.618016 |
2 | 7 | 2 | 7 | 1.332504e+00 | 6927.319698 | 6.867789 |
2 | 7 | 2 | 7 | 1.445264e-09 | 7066.250800 | 4.685370 |
2 | 7 | 2 | 7 | 2.455396e+01 | 6914.555419 | 39.392666 |
2 | 7 | 2 | 7 | 8.173925e+00 | 21090.861869 | 13.985404 |
2 | 8 | 2 | 8 | 2.209519e-01 | 7153.275185 | 15.427448 |
2 | 8 | 2 | 8 | 3.200963e+00 | 6740.749776 | 14.768179 |
2 | 8 | 2 | 8 | 8.386498e+00 | 6807.663936 | 16.221843 |
2 | 8 | 2 | 8 | 1.137082e+01 | 7067.945941 | 1.764125 |
2 | 8 | 2 | 8 | 1.990353e-09 | 7089.220503 | 1.893732 |
2 | 8 | 2 | 8 | 2.814614e+01 | 7107.139223 | 12.469921 |
2 | 8 | 2 | 8 | 2.996777e+01 | 7136.545569 | 4.744430 |
2 | 8 | 2 | 8 | 1.526732e+01 | 7088.501464 | 19.577850 |
2 | 8 | 2 | 8 | 1.440534e+01 | 7028.174815 | 4.781001 |
2 | 8 | 2 | 8 | 5.645250e-01 | 14152.376098 | 10.298519 |
2 | 8 | 2 | 8 | 6.583064e-08 | 20947.048458 | 11.481887 |
2 | 8 | 2 | 8 | 7.913714e-08 | 7017.704820 | 8.595545 |
2 | 8 | 2 | 8 | 1.583436e-04 | 19993.302120 | 17.437462 |
2 | 8 | 2 | 8 | 3.959867e-04 | 12421.697166 | 13.026074 |
Note also that the user does not have any motion activity data.
$ zgrep background /tmp/tmp/emission_ind_....gz | sort | uniq
"key": "background/battery",
"key": "background/filtered_location",
"key": "background/location",
I wonder if that is the reason why our spurious trip detection code is not catching this automatically
If we were incorporating this into the pipeline, we would reset the pipeline to before the 6th and then re-run.
Since we are not planning to do that right now, we will instead insert the mode_confirm
, purpose_confirm
, ... objects through a script. Then when we run the pipeline again, the input matching will find the corresponding match.
Before inserting the entries, the user inputs are as below. It looks like the user confirmed several trips before stopping. Need to confirm with them if they are indeed inaccurate.
mode_confirm | purpose_confirm | replaced_mode |
---|---|---|
error | not_accurate | not_accurate |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | personal_med | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
pilot_ebike | work | drove_alone |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
NaN | NaN | NaN |
not_a_trip | error | not_accurate |
After manually inserting entries on a copy of the database and then re-running the pipeline, we get
mode_confirm | purpose_confirm | replaced_mode |
---|---|---|
not_a_trip | not_a_trip | not_accurate |
not_a_trip | not_a_trip | drove_alone |
not_a_trip | not_a_trip | drove_alone |
not_a_trip | not_a_trip | drove_alone |
not_a_trip | not_a_trip | drove_alone |
not_a_trip | not_a_trip | drove_alone |
not_a_trip | not_a_trip | drove_alone |
After configuring the analysis pipeline to included replaced mode, we get
mode_confirm | purpose_confirm | replaced_mode |
---|---|---|
not_a_trip | not_a_trip | not_a_trip |
not_a_trip | not_a_trip | not_a_trip |
not_a_trip | not_a_trip | not_a_trip |
not_a_trip | not_a_trip | not_a_trip |
not_a_trip | not_a_trip | not_a_trip |
not_a_trip | not_a_trip | not_a_trip |
We are now ready to change this on the production server once we get confirmation from the user that the trips are in fact spurious.
"Thank you. Trips are showing a straight line across town. "