User reports lots of spurious trips on iOS

shankari commented 2 years ago

"Thank you. Trips are showing a straight line across town. "

shankari commented 2 years ago

Hm, the user does not appear to have any transitions for the past week

>>> start_ts = arrow.get("2022-02-01").timestamp
>>> end_ts = arrow.get("2022-02-09").timestamp
>>> transition_df = ts.get_data_df("statemachine/transition", time_query=estt.TimeQuery("data.ts", startTs=start_ts, endTs=end_ts))

Returns an empty dataframe.

shankari commented 2 years ago

Searching backwards, we find that the last transition was from 2021-12-08T17:08:52.017765-07:00

shankari commented 2 years ago

transition_df = ts.get_data_df("analysis/confirmed_trip", time_query=estt.TimeQuery("data.start_ts", startTs=start_ts, endTs=end_ts))
transition_df.tail()

shows us that the last trip is indeed from

2022-02-08T17:05:22.999877-07:00

Need to investigate why we stopped getting transitions and how our algorithm works when they are not present This is likely the root cause.

shankari commented 2 years ago

Focusing on trips from the 7th of Feb, we see a clear spike at around 7k

which persists while zooming in

There also appears to be an issue where the durations for that trip seem to be all over the map.

shankari commented 2 years ago

Doing an initial pass at classifying good vs. bad:

potential_bad_trips = feb_7_confirmed_trip_df[np.logical_and(feb_7_confirmed_trip_df.distance > 6500, feb_7_confirmed_trip_df.distance < 7500)]
potential_good_trips = feb_7_confirmed_trip_df[np.logical_or(feb_7_confirmed_trip_df.distance < 6500, feb_7_confirmed_trip_df.distance > 7500)]

And plotting the trips, they are indeed in a straight line across town (maps redacted for privacy reasons). Interestingly, while trying to plot the non resampled locations, it looks like there are none.

Found 0 features from 0 points
Found 0 features from 0 points
Found 0 features from 0 points
...
Found 0 features from 0 points
Found 0 features from 0 points
Found 0 features from 0 points

Checking to see if this is a characteristic of all potential bad trips and of any potential good trips.

shankari commented 2 years ago

There are no location points for the bad trips.

>>> pd.Series([len(ts.get_data_df("background/location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
    for t in potential_bad_trips.to_dict(orient="records")]).unique()
array([0])

There are no location points for the good trips as well.

>>> pd.Series([len(ts.get_data_df("background/location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
    for t in potential_good_trips.to_dict(orient="records")]).unique()
array([0])

There are apparently no location points for the entire month of Feb.

>>> ts.get_data_df("background/location", time_query=estt.TimeQuery("data.start_ts", startTs=start_ts, endTs=end_ts))
_

Last location point was from December as well 2021-12-08. Wait - maybe we stopped storing the values after Dec because we hit the query limit.

shankari commented 2 years ago

It also turned out that we hadn't filtered for the 7th correctly. After fixing this, we now have:

>>> (len(potential_good_trips), len(potential_bad_trips))
(15, 27)

But every single trip seems to be a straight line, BUT they don't always have the same endpoints. The main difference between the "good" and "bad" trips seems to be that the endpoints sometimes double back.

But given that they are straight lines, the distance between the endpoints and the distance of the trip are likely to be the same. Let's see if that helps.

shankari commented 2 years ago

Ah, they are straight lines there are back. The actual O-D distance, even for the "bad trips" is very small

>>> potential_bad_trips[["distance", "od_distance"]]

distance	od_distance
7073.937531	1.853653e+01
6868.309384	1.127387e+01
7100.056180	1.170578e+01
6710.867451	1.207159e+01
7123.187586	1.223378e+01
7079.736764	2.990119e+01
6735.314398	3.331056e+00
7096.913321	3.156922e+00
7119.615165	5.456963e-01

Unfortunately, that means that we can't actually use the o-d distance, since this could happen legitimately for a round trip. But maybe for this user, for the immediate use case, it can be a good check?

shankari commented 2 years ago

Looking at the potential good trips, we have

zooming in on trips with OD below 3000, they are all the round trips

shankari commented 2 years ago

So if we categorize further:

>>> potential_bad_in_good = potential_good_trips[potential_good_trips.od_distance < 3000]
>>> potential_good_in_good = potential_good_trips[potential_good_trips.od_distance > 3000]
>>> len(potential_bad_trips), len(potential_bad_in_good), len(potential_good_in_good)
(27, 11, 4)

Visualizing those 4 trips, we get what appear to be one-way trips. But we can probably start with this for now and let the user mark the 4/(27+11+4) = 10% of bad one-way trips manually.

Let's see how many trips from the beginning of Feb would be affected.

shankari commented 2 years ago

start_ld = ecwld.LocalDate(year=2022, month=2, day=1)
end_ld = ecwld.LocalDate(year=2022, month=2, day=28)
all_jan_feb_confirmed_trip_df = ts.get_data_df("analysis/confirmed_trip", time_query=esttc.TimeComponentQuery("data.start_local_dt", start_ld, end_ld))
all_jan_feb_confirmed_trip_df["od_distance"] = all_jan_feb_confirmed_trip_df.apply(lambda r: ecc.calDistance(r.start_loc["coordinates"], r.end_loc["coordinates"], coordinates=False), axis=1)
all_feb_potential_bad_trips = all_jan_feb_confirmed_trip_df[all_jan_feb_confirmed_trip_df.od_distance < 100]
len(all_feb_potential_bad_trips), len(all_jan_feb_confirmed_trip_df)
Result: (58, 75)

Majority are from the 6th. 7th and 8th. One from the 1st. Scatter plot shows vertical lines at various distances.

shankari commented 2 years ago

Durations range from 1000 secs (1000/60 = 16 mins) to 4000 secs (4000/60 = 66 mins = 1 hour).

No clear signal in speeds either

shankari commented 2 years ago

to recap, at this point, we have a pretty good check (OD-distance < 100m). Any false negatives (trip was spurious but we didn't catch it), can be handled by the user, this would be a max of 17. Any false positives might be a problem, and we might want to come up with an additional check. This is likely to involve the actual location points.

Let's plot the trip from the first since it is most likely to be the false positive (if one exists). The first is a false positive.

shankari commented 2 years ago

Checking the other fields, it is a lot more than 7k in distance. Let's plot the other trips with > 7k in distance and see if they are spurious.

shankari commented 2 years ago

So there are 8 trips > 7k in distance

index	start_local_dt_month	start_local_dt_day	end_local_dt_month	end_local_dt_day	duration	distance	od_distance	mean_speed
3	2	1	2	1	2308.166839	14184.026960	1.678697e+01	6.145148
11	2	6	2	6	1481.346913	14066.098733	5.955038e-03	9.495479
35	2	7	2	7	1571.072589	13915.663279	2.087384e-01	8.857429
37	2	7	2	7	1552.858794	12220.480719	4.796685e-01	7.869666
44	2	7	2	7	2387.434461	20542.246691	7.900875e-10	8.604319
47	2	7	2	7	1806.244204	13965.648219	4.733766e-01	7.731872
51	2	7	2	7	3133.604467	14063.391410	7.117096e+00	4.487928
71	2	8	2	8	4443.580903	31850.950049	2.713432e+00	7.167856

On mapping them, the first and last entries (3 and 71) are valid round trips. The others are not.

Plotting the various trip level metrics, we don't see a clear separation between valid and invalid.

shankari commented 2 years ago

Re-exported data for only the year 2022.

We now see transitions, and all the transitions for the 7th seem to be visit only, without a corresponding geofence exit. That might be potential discriminant.

fmt_time	transition_name	state_name
2022-02-07T00:24:46.908138-07:00	TransitionType.NOP	State.WAITING_FOR_TRIP_START
2022-02-07T00:24:46.916240-07:00	TransitionType.VISIT_ENDED	State.WAITING_FOR_TRIP_START
2022-02-07T00:35:28.095417-07:00	TransitionType.VISIT_STARTED	State.ONGOING_TRIP
--	--	--
2022-02-07T00:37:06.316291-07:00	TransitionType.VISIT_ENDED	State.WAITING_FOR_TRIP_START
2022-02-07T00:37:08.928151-07:00	TransitionType.VISIT_STARTED	State.ONGOING_TRIP
--	--	--
2022-02-07T00:50:28.613245-07:00	TransitionType.VISIT_ENDED	State.WAITING_FOR_TRIP_START
2022-02-07T00:50:29.307538-07:00	TransitionType.VISIT_STARTED	State.ONGOING_TRIP
--	--	--
2022-02-07T01:03:43.421707-07:00	TransitionType.VISIT_ENDED	State.WAITING_FOR_TRIP_START
2022-02-07T01:04:02.212470-07:00	TransitionType.VISIT_STARTED	State.ONGOING_TRIP
--	--	--
2022-02-07T01:28:44.901669-07:00	TransitionType.VISIT_ENDED	State.WAITING_FOR_TRIP_START
2022-02-07T01:29:51.414304-07:00	TransitionType.VISIT_STARTED	State.ONGOING_TRIP
--	--	--

shankari commented 2 years ago

Re-running the rest of the analysis, we now have 79 trips, so it looks like the issue resolved itself after the 8th?

start_ld = ecwld.LocalDate(year=2022, month=2, day=1)
end_ld = ecwld.LocalDate(year=2022, month=2, day=28)
all_jan_feb_confirmed_trip_df = ts.get_data_df("analysis/confirmed_trip", time_query=esttc.TimeComponentQuery("data.start_local_dt", start_ld, end_ld))
all_jan_feb_confirmed_trip_df["od_distance"] = all_jan_feb_confirmed_trip_df.apply(lambda r: ecc.calDistance(r.start_loc["coordinates"], r.end_loc["coordinates"], coordinates=False), axis=1)
all_feb_potential_bad_trips = all_jan_feb_confirmed_trip_df[all_jan_feb_confirmed_trip_df.od_distance < 100]
len(all_feb_potential_bad_trips), len(all_jan_feb_confirmed_trip_df)
Result: (58, 79)

start_local_dt_day	end_local_dt_day
8	8
8	8
8	8
8	8
8	8
8	8
11	11
12	12
12	12
13	13

Looking at these last four trips, one has a clearly defined trajectory. The others are little groups of points, similar to some trips on the 8th.

But the number of locations seems like a potential discriminator.

pd.Series([len(ts.get_data_df("background/location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
    for t in all_feb_potential_bad_trips.to_dict(orient="records")]).unique()
Result: array([2227,   32,   24,   21,   41,   22,   15,   16,   55,   19,   12,
         20,   14,   34,   23,   28,   44,   67,   39,   69,   25,   35,
         49,   68,   38,   46,   75, 4032])

pd.Series([len(ts.get_data_df("background/filtered_location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
    for t in all_feb_potential_bad_trips.to_dict(orient="records")]).unique()
Result: array([2227,   15,    7,   25,   14,    8,   31,    5,   11,    9,   13,
         20,    6,   38,   39,   28,   27,   22,   16,   37,   34, 4032])

Still have the same potentially bad trips that are actually good. Plotting this, we get

pd.Series([len(ts.get_data_df("background/filtered_location", time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t))))
    for t in all_feb_potential_bad_trips_actually_good.to_dict(orient="records")]).plot(kind="bar")

So it looks like that will work!

shankari commented 2 years ago

Double checking by mapping some known bad trips from the morning of the 7th

bad_plot_map

The last few trips on the 8th+ have one trip that looks like that, and others that just look like a cluster of points at the destination.

So the big gap/sparse points seems like a good check, at least for this user at this time. Need to think about whether we want to incorporate it into the regular pipeline.

Double checking...

potential_bad_trips["n_locations"] = potential_bad_trips.apply(lambda t: len(ts.get_data_df("background/filtered_location",
                time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t)))), axis=1)
potential_bad_trips.distance/potential_bad_trips.n_locations

1     1010.562504
2      490.593527
3     1014.293740
4      447.391163
5      890.398448
7     1415.947353
8      612.301309
9      788.545925
10     889.951896
11     532.434733
13     468.722233
14     890.146060
15     525.736338
16    1182.026649
17     507.363236
18    1153.445392
19     473.915800
20     996.316469
21     529.151662
24     879.441002
26    1013.337456
28     985.787600
30     550.428395
31    1011.751036
34     989.617100
36    1009.464400
37     460.970361
dtype: float64

And for the mixed dataset

all_feb_potential_bad_trips_actually_good["n_locations"] = all_feb_potential_bad_trips_actually_good.apply(lambda t: len(ts.get_data_df("background/filtered_location",
                time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t)))), axis=1)
all_feb_potential_bad_trips_actually_good.distance / all_feb_potential_bad_trips_actually_good.n_locations

3       6.369119
11    562.643949
35    993.975949
37    321.591598
44    760.823952
47    634.802192
51    878.961963
71      7.899541
dtype: float64

Note that our filter distance is supposed to be 1 meter. https://github.com/e-mission/e-mission-data-collection/blob/master/src/ios/Wrapper/LocationTrackingConfig.m#L26

{'is_duty_cycling': True, 'filter_distance': 1, 'simulate_user_interaction': False, 'accuracy_threshold': 200, 'filter_time': -1, 'geofence_radius': 100, 'ios_use_visit_notifications_for_detection': True, 'ios_use_remote_push_for_sync': True, 'accuracy': 100, 'trip_end_stationary_mins': 10, 'android_geofence_responsiveness': -1}

So a possible threshold could be 100x that, so a density of > 100m

shankari commented 2 years ago

To summarize, our check for "invalid trip" is:

od_distance < 100 and not
(distance > 7500 and
mean_speed < 10 and
distance/n_locations > 100)

Let's see how many of these show up for this user overall

all_confirmed_trip_df = ts.get_data_df("analysis/confirmed_trip")
all_confirmed_trip_df["od_distance"] = all_confirmed_trip_df.apply(lambda r: ecc.calDistance(r.start_loc["coordinates"], r.end_loc["coordinates"], coordinates=False), axis=1)
all_confirmed_trip_df["mean_speed"] = all_confirmed_trip_df.distance / all_confirmed_trip_df.duration
first_three_checks_overall = all_confirmed_trip_df.query("od_distance < 100 and not (distance > 7500 and mean_speed < 10)")
len(first_three_checks_overall), len(all_confirmed_trip_df), (len(first_three_checks_overall)/len(all_confirmed_trip_df))
first_three_checks_overall[["start_local_dt_month", "start_local_dt_day", "end_local_dt_month", "end_local_dt_day", "od_distance", "distance", "mean_speed"]]

start_local_dt_month	start_local_dt_day	end_local_dt_month	end_local_dt_day	od_distance	distance	mean_speed
1	1	1	1	6.986333e+01	156.678443	0.327779
1	1	1	1	1.740171e+01	1084.129455	0.798355
1	5	1	5	8.995684e+01	363.770437	0.478790
1	5	1	5	4.107239e-02	1990.706388	1.316896
1	5	1	5	5.988431e+01	1132.902350	3.496865
1	18	1	18	6.652086e+01	1597.639593	2.206089
2	6	2	6	1.850196e-01	7103.825879	1.944089
2	6	2	6	2.342991e+00	7039.310841	1.698978
2	6	2	6	0.000000e+00	7112.693053	1.702703
2	7	2	7	1.853653e+01	7073.937531	11.576851
2	7	2	7	1.127387e+01	6868.309384	27.608938
2	7	2	7	1.170578e+01	7100.056180	5.205115
2	7	2	7	1.207159e+01	6710.867451	24.672423
2	7	2	7	1.223378e+01	7123.187586	12.798619
2	7	2	7	3.101800e+01	20965.968319	15.490884
2	7	2	7	2.990119e+01	7079.736764	11.872148
2	7	2	7	3.331056e+00	6735.314398	17.452288
2	7	2	7	3.156922e+00	7096.913321	11.676411
2	7	2	7	5.456963e-01	7119.615165	4.834291
2	7	2	7	1.297290e+01	6921.651532	27.754705
2	7	2	7	1.340543e+01	13834.872888	12.779537
2	7	2	7	8.192925e+00	7030.833489	10.756582
2	7	2	7	9.255376e+00	7121.168479	5.306681
2	7	2	7	9.398279e+00	6834.572396	18.271643
2	7	2	7	1.310545e+00	7092.159891	5.376771
2	7	2	7	6.030290e+00	7103.085299	17.716728
2	7	2	7	1.227530e+01	6920.672352	12.350568
2	7	2	7	5.145222e-01	7108.737007	8.657235
2	7	2	7	1.862717e+00	6974.215285	17.912286
2	7	2	7	8.986348e+00	6878.971601	18.440336
2	7	2	7	2.661333e-09	7035.528018	11.337731
2	7	2	7	4.796685e-01	7093.362191	2.808319
2	7	2	7	1.556720e+00	21974.470798	15.933950
2	7	2	7	1.363708e+00	6900.513198	30.312781
2	7	2	7	6.431294e-01	13970.753169	11.931122
2	7	2	7	8.361405e-01	7155.569139	4.667499
2	7	2	7	9.374204e-07	7082.257252	11.380397
2	7	2	7	8.591278e-01	6312.254022	13.618016
2	7	2	7	1.332504e+00	6927.319698	6.867789
2	7	2	7	1.445264e-09	7066.250800	4.685370
2	7	2	7	2.455396e+01	6914.555419	39.392666
2	7	2	7	8.173925e+00	21090.861869	13.985404
2	8	2	8	2.209519e-01	7153.275185	15.427448
2	8	2	8	3.200963e+00	6740.749776	14.768179
2	8	2	8	8.386498e+00	6807.663936	16.221843
2	8	2	8	1.137082e+01	7067.945941	1.764125
2	8	2	8	1.990353e-09	7089.220503	1.893732
2	8	2	8	2.814614e+01	7107.139223	12.469921
2	8	2	8	2.996777e+01	7136.545569	4.744430
2	8	2	8	1.526732e+01	7088.501464	19.577850
2	8	2	8	1.440534e+01	7028.174815	4.781001
2	8	2	8	5.645250e-01	14152.376098	10.298519
2	8	2	8	6.583064e-08	20947.048458	11.481887
2	8	2	8	7.913714e-08	7017.704820	8.595545
2	8	2	8	1.583436e-04	19993.302120	17.437462
2	8	2	8	3.959867e-04	12421.697166	13.026074

shankari commented 2 years ago

Recomputing in a different way, we get the same result:

first_check_overall = all_confirmed_trip_df.query("od_distance < 100")
next_two_checks_good = first_check_overall.query("distance > 7500 and mean_speed < 10")
next_two_checks_good[["start_local_dt_month", "start_local_dt_day", "end_local_dt_month", "end_local_dt_day", "od_distance", "distance", "mean_speed"]]

start_local_dt_month	start_local_dt_day	end_local_dt_month	end_local_dt_day	od_distance	distance	mean_speed
1	4	1	4	5.124098e+01	11516.539891	8.191983
2	1	2	1	1.678697e+01	14184.026960	6.145148
2	6	2	6	5.955038e-03	14066.098733	9.495479
2	7	2	7	2.087384e-01	13915.663279	8.857429
2	7	2	7	4.796685e-01	12220.480719	7.869666
2	7	2	7	7.900875e-10	20542.246691	8.604319
2	7	2	7	4.733766e-01	13965.648219	7.731872
2	7	2	7	7.117096e+00	14063.391410	4.487928
2	8	2	8	2.713432e+00	31850.950049	7.167856

first_three_checks_overall = first_check_overall[np.logical_not(np.logical_and(first_check_overall.distance > 7500, first_check_overall.mean_speed < 10))]
len(first_check_overall), len(next_two_checks_good), len(first_three_checks_overall), len(all_confirmed_trip_df), (len(first_three_checks_overall)/len(all_confirmed_trip_df))

(65, 9, 56, 140, 0.4)

Visualizing the maps before 6th Feb, we get a bunch of valid trips. We need to add the density check as well.

shankari commented 2 years ago

After adding the density check, it looks good.

first_three_checks_overall["n_locations"] = first_three_checks_overall.apply(lambda t: len(ts.get_data_df("background/filtered_location",
                time_query = esdat.get_time_query_for_trip_like_object(ecwct.Confirmedtrip(t)))), axis=1)
first_three_checks_overall["loc_density"] = first_three_checks_overall.distance / first_three_checks_overall.n_locations
all_four_checks = first_three_checks_overall[first_three_checks_overall.loc_density > 100]
len(first_check_overall), len(next_two_checks_good), len(first_three_checks_overall), len(all_four_checks), len(all_confirmed_trip_df), (len(first_three_checks_overall)/len(all_confirmed_trip_df)), (len(all_four_checks)/len(all_confirmed_trip_df))

Result: (65, 9, 56, 50, 140, 0.4, 0.35714285714285715)

start_local_dt_month	start_local_dt_day	end_local_dt_month	end_local_dt_day	od_distance	distance	mean_speed
2	6	2	6	1.850196e-01	7103.825879	1.944089
2	6	2	6	2.342991e+00	7039.310841	1.698978
2	6	2	6	0.000000e+00	7112.693053	1.702703
2	7	2	7	1.853653e+01	7073.937531	11.576851
2	7	2	7	1.127387e+01	6868.309384	27.608938
2	7	2	7	1.170578e+01	7100.056180	5.205115
2	7	2	7	1.207159e+01	6710.867451	24.672423
2	7	2	7	1.223378e+01	7123.187586	12.798619
2	7	2	7	3.101800e+01	20965.968319	15.490884
2	7	2	7	2.990119e+01	7079.736764	11.872148
2	7	2	7	3.331056e+00	6735.314398	17.452288
2	7	2	7	3.156922e+00	7096.913321	11.676411
2	7	2	7	5.456963e-01	7119.615165	4.834291
2	7	2	7	1.297290e+01	6921.651532	27.754705
2	7	2	7	1.340543e+01	13834.872888	12.779537
2	7	2	7	8.192925e+00	7030.833489	10.756582
2	7	2	7	9.255376e+00	7121.168479	5.306681
2	7	2	7	9.398279e+00	6834.572396	18.271643
2	7	2	7	1.310545e+00	7092.159891	5.376771
2	7	2	7	6.030290e+00	7103.085299	17.716728
2	7	2	7	1.227530e+01	6920.672352	12.350568
2	7	2	7	5.145222e-01	7108.737007	8.657235
2	7	2	7	1.862717e+00	6974.215285	17.912286
2	7	2	7	8.986348e+00	6878.971601	18.440336
2	7	2	7	2.661333e-09	7035.528018	11.337731
2	7	2	7	4.796685e-01	7093.362191	2.808319
2	7	2	7	1.556720e+00	21974.470798	15.933950
2	7	2	7	1.363708e+00	6900.513198	30.312781
2	7	2	7	6.431294e-01	13970.753169	11.931122
2	7	2	7	8.361405e-01	7155.569139	4.667499
2	7	2	7	9.374204e-07	7082.257252	11.380397
2	7	2	7	8.591278e-01	6312.254022	13.618016
2	7	2	7	1.332504e+00	6927.319698	6.867789
2	7	2	7	1.445264e-09	7066.250800	4.685370
2	7	2	7	2.455396e+01	6914.555419	39.392666
2	7	2	7	8.173925e+00	21090.861869	13.985404
2	8	2	8	2.209519e-01	7153.275185	15.427448
2	8	2	8	3.200963e+00	6740.749776	14.768179
2	8	2	8	8.386498e+00	6807.663936	16.221843
2	8	2	8	1.137082e+01	7067.945941	1.764125
2	8	2	8	1.990353e-09	7089.220503	1.893732
2	8	2	8	2.814614e+01	7107.139223	12.469921
2	8	2	8	2.996777e+01	7136.545569	4.744430
2	8	2	8	1.526732e+01	7088.501464	19.577850
2	8	2	8	1.440534e+01	7028.174815	4.781001
2	8	2	8	5.645250e-01	14152.376098	10.298519
2	8	2	8	6.583064e-08	20947.048458	11.481887
2	8	2	8	7.913714e-08	7017.704820	8.595545
2	8	2	8	1.583436e-04	19993.302120	17.437462
2	8	2	8	3.959867e-04	12421.697166	13.026074

shankari commented 2 years ago

Note also that the user does not have any motion activity data.

$ zgrep background /tmp/tmp/emission_ind_....gz | sort | uniq
            "key": "background/battery",
            "key": "background/filtered_location",
            "key": "background/location",

I wonder if that is the reason why our spurious trip detection code is not catching this automatically

shankari commented 2 years ago

If we were incorporating this into the pipeline, we would reset the pipeline to before the 6th and then re-run. Since we are not planning to do that right now, we will instead insert the mode_confirm, purpose_confirm, ... objects through a script. Then when we run the pipeline again, the input matching will find the corresponding match.

Before inserting the entries, the user inputs are as below. It looks like the user confirmed several trips before stopping. Need to confirm with them if they are indeed inaccurate.

mode_confirm	purpose_confirm	replaced_mode
error	not_accurate	not_accurate
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	personal_med	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
pilot_ebike	work	drove_alone
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
NaN	NaN	NaN
not_a_trip	error	not_accurate

shankari commented 2 years ago

After manually inserting entries on a copy of the database and then re-running the pipeline, we get

mode_confirm	purpose_confirm	replaced_mode
not_a_trip	not_a_trip	not_accurate
not_a_trip	not_a_trip	drove_alone
not_a_trip	not_a_trip	drove_alone
not_a_trip	not_a_trip	drove_alone
not_a_trip	not_a_trip	drove_alone
not_a_trip	not_a_trip	drove_alone
not_a_trip	not_a_trip	drove_alone

After configuring the analysis pipeline to included replaced mode, we get

mode_confirm	purpose_confirm	replaced_mode
not_a_trip	not_a_trip	not_a_trip
not_a_trip	not_a_trip	not_a_trip
not_a_trip	not_a_trip	not_a_trip
not_a_trip	not_a_trip	not_a_trip
not_a_trip	not_a_trip	not_a_trip
not_a_trip	not_a_trip	not_a_trip

We are now ready to change this on the production server once we get confirmation from the user that the trips are in fact spurious.

e-mission / e-mission-docs

User reports lots of spurious trips on iOS #704