e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

AssertionError while processing _fix_squished_place_mismatch #251

Open shankari opened 8 years ago

shankari commented 8 years ago

We find raw -> filtered

2016-10-23 03:38:13,117:DEBUG:139657654888256:For raw trip 580bf83888f66356dad2d40a, found filtered trip 580c30a488f6635b29ea55be

This is a trip, not untracked

2016-10-23 03:38:13,121:DEBUG:139657654888256:_is_squished_untracked: 580bb32e88f663518b966b23 is a segmentation/raw_trip, not segmentation/raw_untracked, returning False

There's a minor difference between the two points (18 meters apart).

ecc.calDistance([-122.2588476074135, 37.87331113595317], [-122.2588238866276, 37.87347466688405]) 18.30262099474195

2016-10-23 03:38:13,124:DEBUG:139657654888256:squishing mismatch: resetting trip start_loc [-122.2588476074135, 37.87331113595317] to cleaned_start_place.location [-122.2588238866276, 37.87347466688405]

Note that we attempt to fix the mismatch if the coordinates are not identical

    if (cleaned_start_place_data.location.coordinates !=
                       raw_start_place.data.location.coordinates):
        _fix_squished_place_mismatch(cleaned_trip.user_id, cleaned_trip.get_id(),
                                     ts, cleaned_trip_data, cleaned_start_place_data)

We look for a location at ts = 1477155094.000022. Why? Do we find it?

2016-10-23 03:38:13,175:DEBUG:139657654888256:get_entry_at_ts query = {'metadata.key': 'background/filtered_location', 'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), 'data.ts': 1477155094.000022}

Unsure, but we insert an entry at 1477154950.1033349

2016-10-23 03:38:13
178:DEBUG:139657654888256:Inserting entry Entry({'data': Recreatedlocation(
{u'loc': {u'type': u'Point' u'coordinates': [-122.2588238866276 37.87347466688405]}
 u'fmt_time': '2016-10-22T09:49:10.103335-07:00'
 u'ts': 1477154950.1033349
 'mode': 7
 'section': ObjectId('580c30a388f6635b29ea54ff')
 u'sensed_speed': 3.579999923706055
 'distance': 0})
 '_id': ObjectId('580c30a588f6635b29ea55dd')
 'key': 'analysis/recreated_location'})}) into timeseries

Then, we query for the entries for this section (1477154950.103335 -> 1477160591.999968). We find 1906 entries, which are no de-duped. and then the speeds mismatch.

2016-10-23 03:38:13,194:DEBUG:139657654888256:curr_query = {'$or': [{'metadata.key': 'analysis/recreated_location'}], 'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), 'data.ts': {'$lte': 1477160591.999968, '$gte': 1477154950.103335}}, sort_key = data.ts
2016-10-23 03:38:13,194:DEBUG:139657654888256:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/recreated_location']
2016-10-23 03:38:13,194:DEBUG:139657654888256:finished querying values for []
2016-10-23 03:38:13,194:DEBUG:139657654888256:finished querying values for ['analysis/recreated_location']
2016-10-23 03:38:14,055:DEBUG:139657654888256:Found 1906 results
2016-10-23 03:38:14,061:DEBUG:139657654888256:After de-duping, converted 1906 points to 1906 
2016-10-23 03:38:14,062:DEBUG:139657654888256:Returning entry with length 1906 result
2016-10-23 03:38:14,589:ERROR:139657654888256:[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] != [0.0, 0.6100873664913984, 0.05965627862873103, 0.059656275796728137, 0.05965627291921605, 0.05965627008539246, 0.059656267209699104, 0.05965626437769427, 0.059656261500180226, 0.05965625864542004]
2016-10-23 03:38:14,590:ERROR:139657654888256:Cleaning and resampling failed for user 8f1a6737-e8ee-4769-a1a4-da1948d886b6
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 72, in clean_and_resample
    last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 90, in save_cleaned_segments_for_ts
    return save_cleaned_segments_for_timeline(user_id, tl)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 116, in save_cleaned_segments_for_timeline
    (last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 878, in create_and_link_timeline
    link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 932, in link_trip_start
    ts, cleaned_trip_data, cleaned_start_place_data)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1055, in _fix_squished_place_mismatch
    assert False
AssertionError
shankari commented 8 years ago

Relevant code is here. Basically, the values computed from the location points are all zero BUT they are not duped, apparently. This seems pretty weird.

1048     # Validate the distance and speed calculations. Can remove this after validation
1049     loc_df = esda.get_data_df(esda.CLEANED_LOCATION_KEY, user_id,
1050                               esda.get_time_query_for_trip_like(esda.CLEANED_SECTION_KEY, first_section.get_id()))
1051     loc_df.rename(columns={"speed": "from_points_speed", "distance": "from_points_distance"}, inplace=True)
1052     with_speeds_df = eaicl.add_dist_heading_speed(loc_df)
1053     if not ecc.compare_rounded_arrays(with_speeds_df.speed.tolist(), first_section_data["speeds"], 10):
1054         logging.error("%s != %s" % (with_speeds_df.speed.tolist()[:10], first_section_data["speeds"][:10]))
1055         assert False
1056 
shankari commented 8 years ago

Pulling the data from the 22nd and re-running does not trigger the error.

2016-10-23T14:32:10.278959-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: moving to long term**********
2016-10-23T14:32:10.327370-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: filter accuracy if needed**********
2016-10-23T14:32:10.339289-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: segmenting into trips**********
2016-10-23T14:32:11.237535-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: segmenting into sections**********
2016-10-23T14:32:11.625732-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: smoothing sections**********
/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/location_smoothing.py:265: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  sorted_filled_df = filled_df.sort("ts").reset_index()
2016-10-23T14:32:12.123586-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: cleaning and resampling timeline**********
2016-10-23T14:32:14.925044-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: checking active mode trips to autocheck habits**********
2016-10-23T14:32:14.928033-07:00**********UUID f913b9bc-41bd-4754-bbf8-e6d08ce6f69b: storing views to cache**********
shankari commented 8 years ago

we don't have a squishing mismatch in the reproduction

C02KT61MFFT0:tmp shankari$ zgrep "squishing" intake_single.log.gz 
C02KT61MFFT0:tmp shankari$ 

Also, the time range that returns all the zero speeds is:

In [3]: arrow.get(1477154950.103335).to("America/Los_Angeles")
Out[3]: <Arrow [2016-10-22T09:49:10.103335-07:00]>

In [4]: arrow.get(1477160591.999968).to("America/Los_Angeles")
Out[4]: <Arrow [2016-10-22T11:23:11.999968-07:00]>
shankari commented 8 years ago

the section that failed on production (09:49:10 -> 11:23:11) is a place here because one trip ends at 09:51:34, the next one starts at 11:10:37 and ends at 11:23:11.

2016-10-23 14:32:12,146:DEBUG:140735236476928:Considering trip 580d2c5bf6858f62a4e073a1: 2016-10-22T09:19:07.649622-07:00 -> 2016-10-22T09:51:34.000022-07:00 
2016-10-23 14:32:12,146:DEBUG:140735236476928:Considering trip 580d2c5bf6858f62a4e073a3: 2016-10-22T11:10:47.938500-07:00 -> 2016-10-22T11:23:11.999968-07:00 
2016-10-23 14:32:12,147:DEBUG:140735236476928:Considering trip 580d2c5bf6858f62a4e073a7: 2016-10-22T11:23:11.999968-07:00 -> 2016-10-22T15:22:32.442582-07:00 
2016-10-23 14:32:12,147:DEBUG:140735236476928:Considering trip 580d2c5bf6858f62a4e073a5: 2016-10-22T15:22:32.442582-07:00 -> 2016-10-22T15:35:44.999942-07:00 

Let us look at the cause of that mismatch. On the real server, we consider

2016-10-23 03:38:10,398:DEBUG:139657654888256:Considering trip 580bb32e88f663518b966b23: 2016-10-22T11:10:47.938500-07:00 -> 2016-10-22T11:23:11.999968-07:00
2016-10-23 03:38:10,398:DEBUG:139657654888256:Considering trip 580bf83888f66356dad2d40c: 2016-10-22T11:23:11.999968-07:00 -> 2016-10-22T15:22:32.442582-07:00
2016-10-23 03:38:10,399:DEBUG:139657654888256:Considering trip 580bf83888f66356dad2d40a: 2016-10-22T15:22:32.442582-07:00 -> 2016-10-22T15:35:44.999942-07:00

While filtering, we have only three points in this section, which is really slow. So when we try to extrapolate to the previous point, we go back several hours for a short distance. So that's probably why we end up with a lot of zeros.

2016-10-23 03:38:10,521:DEBUG:139657654888256:keys = (segmentation/raw_stop, segmentation/raw_section), len(places) = 0, len(trips) = 1
2016-10-23 03:38:10,536:DEBUG:139657654888256:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), 'data.ts': {'$lte': 1477160591.999968, '$gte': 1477159847.9385}}, sort_key = data.ts
2016-10-23 03:38:10,536:DEBUG:139657654888256:orig_ts_db_keys = ['background/filtered_location'], analysis_ts_db_keys = []
2016-10-23 03:38:10,562:DEBUG:139657654888256:deleting 0 points from section points
2016-10-23 03:38:10,589:DEBUG:139657654888256:Found 3 results
2016-10-23 03:38:10,625:DEBUG:139657654888256:After filtering, rows went from 3 -> 3, cols from 23 -> 22
2016-10-23 03:38:10,626:DEBUG:139657654888256:Found first section, may need to extrapolate start point
2016-10-23 03:38:10,649:DEBUG:139657654888256:Adding distance 290.395810466 to original 265.339521226 to extend section start from [-122.2605134268499, 37.87556754428695] to [-122.2588476074135, 37.87331113595317]
2016-10-23 03:38:10,653:DEBUG:139657654888256:After subtracting time 25375.1548401 from original 744.061468124 to cover additional distance 290.395810466 at speed 0.0114441000378, new_start_ts = 1477154980.1
2016-10-23 03:38:10,691:DEBUG:139657654888256:with_speeds_df =                            fmt_time            ts   latitude   longitude  \
0  2016-10-22T09:49:40.103335-07:00  1.477155e+09  37.873311 -122.258848   
1  2016-10-22T11:10:47.938500-07:00  1.477160e+09  37.875568 -122.260513   
2  2016-10-22T11:21:16.972802-07:00  1.477160e+09  37.873613 -122.258806   
3  2016-10-22T11:23:11.999968-07:00  1.477161e+09  37.873603 -122.258797   

ecc.calDistance([-122.260513, 37.875568], [-122.258848, 37.873311]) 290.416

So why didn't this happen in the reconstruction? It did.

2016-10-23 14:32:12,667:DEBUG:140735236476928:keys = (segmentation/raw_stop, segmentation/raw_section), len(places) = 0, len(trips) = 1
2016-10-23 14:32:12,672:DEBUG:140735236476928:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('f913b9bc-41bd-4754-bbf8-e6d08ce6f69b'), 'data.ts': {'$lte': 1477160591.999968, '$gte': 1477159847.9385}}, sort_key = data.ts
2016-10-23 14:32:12,675:DEBUG:140735236476928:deleting 0 points from section points
2016-10-23 14:32:12,678:DEBUG:140735236476928:Found 3 results
2016-10-23 14:32:12,683:DEBUG:140735236476928:After filtering, rows went from 3 -> 3, cols from 23 -> 22
2016-10-23 14:32:12,683:DEBUG:140735236476928:Found first section, may need to extrapolate start point
2016-10-23 14:32:12,696:DEBUG:140735236476928:Adding distance 275.950771652 to original 265.339521226 to extend section start from [-122.2605134268499, 37.87556754428695] to [-122.2588238866276, 37.87347466688405]
2016-10-23 14:32:12,698:DEBUG:140735236476928:Much extrapolation! new_start_ts 1477135735.01 < prev_enter_ts 1477155094.0
2016-10-23 14:32:12,699:DEBUG:140735236476928:After subtracting time 24112.9289975 from original 744.061468124 to cover additional distance 275.950771652 at speed 0.0114441000378, new_start_ts = 1477155274.0
2016-10-23 14:32:12,728:DEBUG:140735236476928:with_speeds_df =                            fmt_time            ts   latitude   longitude  \
0  2016-10-22T09:54:34.000022-07:00  1.477155e+09  37.873475 -122.258824   
1  2016-10-22T11:10:47.938500-07:00  1.477160e+09  37.875568 -122.260513   
2  2016-10-22T11:21:16.972802-07:00  1.477160e+09  37.873613 -122.258806   
3  2016-10-22T11:23:11.999968-07:00  1.477161e+09  37.873603 -122.258797   

Resampling original

2016-10-23 03:38:10,806:DEBUG:139657654888256:Resampling data of size 4
2016-10-23 03:38:10,812:DEBUG:139657654888256:Resampling entry list                            fmt_time            ts   longitude   latitude
0  2016-10-22T09:49:40.103335-07:00  1.477155e+09 -122.258848  37.873311
1  2016-10-22T11:10:47.938500-07:00  1.477160e+09 -122.260513  37.875568
2  2016-10-22T11:21:16.972802-07:00  1.477160e+09 -122.258806  37.873613
3  2016-10-22T11:23:11.999968-07:00  1.477161e+09 -122.258797  37.873603 of size 4

2016-10-23 03:38:10,818:DEBUG:139657654888256:After resampling, using 189 points from 1477154980.1 -> 1477160592.0
2016-10-23 03:38:11,371:DEBUG:139657654888256:with_speeds_df =    idx            ts                          fmt_time   longitude   latitude
0    0  1.477155e+09  2016-10-22T09:49:40.103335-07:00 -122.258848  37.873311
1    1  1.477155e+09  2016-10-22T09:50:10.103335-07:00 -122.258858  37.873325
2    2  1.477155e+09  2016-10-22T09:50:40.103335-07:00 -122.258868  37.873339
3    3  1.477155e+09  2016-10-22T09:51:10.103335-07:00 -122.258878  37.873353
4    4  1.477155e+09  2016-10-22T09:51:40.103335-07:00 -122.258889  37.873367

Resampling re-run

2016-10-23 14:32:12,762:DEBUG:140735236476928:Resampling data of size 4
2016-10-23 14:32:12,766:DEBUG:140735236476928:Resampling entry list                            fmt_time            ts   longitude   latitude
0  2016-10-22T09:54:34.000022-07:00  1.477155e+09 -122.258824  37.873475
1  2016-10-22T11:10:47.938500-07:00  1.477160e+09 -122.260513  37.875568
2  2016-10-22T11:21:16.972802-07:00  1.477160e+09 -122.258806  37.873613
3  2016-10-22T11:23:11.999968-07:00  1.477161e+09 -122.258797  37.873603 of size 4

2016-10-23 14:32:12,770:DEBUG:140735236476928:After resampling, using 179 points from 1477155274.0 -> 1477160592.0
2016-10-23 14:32:13,146:DEBUG:140735236476928:with_speeds_df =    idx            ts                          fmt_time   longitude   latitude
0    0  1.477155e+09  2016-10-22T09:54:34.000022-07:00 -122.258824  37.873475
1    1  1.477155e+09  2016-10-22T09:55:04.000022-07:00 -122.258835  37.873488
2    2  1.477155e+09  2016-10-22T09:55:34.000022-07:00 -122.258846  37.873502
3    3  1.477155e+09  2016-10-22T09:56:04.000022-07:00 -122.258857  37.873516
4    4  1.477155e+09  2016-10-22T09:56:34.000022-07:00 -122.258868  37.873530

Looks pretty similar.

So one thing that I noticed from the server is that the number of reconstructed locations returned keeps increasing. This is because the sections are fine, so we save the locations, and then fail while processing the trip. This is probably why we consistently get 0 distances (and 0 speeds), since the distance in the resampled dataframe above are clearly not zero.

Yup, confirmed from the server.

 {u'_id': ObjectId('580ba71588f663507fb41f55'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:00.311224-07:00'}},
 {u'_id': ObjectId('580d5a5588f6637659afd9e0'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580d484b88f663753dac3545'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580d409588f663731bb68350'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580d34d488f663720d46d04d'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580d1a1288f6636fe9989312'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580d15e488f6636e72d28e7f'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580d037488f6636d4d589ab9'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580cf9ac88f6636c374672a5'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580ce96788f6636afdd29f9a'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580c4e3c88f6635e156de5f3'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580c403288f6635c3340cd40'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
 {u'_id': ObjectId('580c30a388f6635b29ea5503'),
  u'data': {u'fmt_time': u'2016-10-22T09:51:10.103335-07:00'}},
shankari commented 8 years ago

So if that is the cause of the subsequent errors? what was the cause of the first error? can this be fixed by just resetting the pipeline and starting over (please let the answer be yes)

shankari commented 8 years ago

For a very long time, all we saw was "moving to long term". Then finally, we got a complete run with no errors.

2016-10-18T16:16:18.651929+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: moving to long term**********
Got error None while saving entry AttrDict({u'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), u'_id': ObjectId('58067140cb17471ac0995def'), u'data': {u'transition': None, u'currState': u'STATE_ONGOING_TRIP'}, u'metadata': {u'plugin': u'none', u'write_ts': 1476816237.438274, u'time_zone': u'America/Los_Angeles', u'platform': u'ios', u'key': u'statemachine/transition', u'read_ts': 0, u'type': u'message'}}) -> None
Got error None while saving entry AttrDict({u'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), u'_id': ObjectId('58067140cb17471ac0995df9'), u'data': {u'transition': None, u'currState': u'STATE_ONGOING_TRIP'}, u'metadata': {u'plugin': u'none', u'write_ts': 1476817205.697516, u'time_zone': u'America/Los_Angeles', u'platform': u'ios', u'key': u'statemachine/transition', u'read_ts': 0, u'type': u'message'}}) -> None
No user defined overrides for 8f1a6737-e8ee-4769-a1a4-da1948d886b6, early return
No user defined overrides for 8f1a6737-e8ee-4769-a1a4-da1948d886b6, early return
...
2016-10-18T19:36:57.918312+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: moving to long term**********
2016-10-18T19:37:00.034842+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: filter accuracy if needed**********
2016-10-18T19:37:00.039082+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: segmenting into trips**********
2016-10-18T19:37:00.897490+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: segmenting into sections**********
2016-10-18T19:37:01.016335+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: smoothing sections**********
2016-10-18T19:37:01.195283+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: cleaning and resampling timeline**********
2016-10-18T19:37:01.851833+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: checking active mode trips to autocheck habits**********

First failure was

2016-10-22T17:51:16.454611+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: clea
ning and resampling timeline**********
[0.0, 2.3263192593682924, 6.3678907209270443, 3.2600227545176255, 6.6239183628185057, 3.0
126220946529756, 6.1001840526895883, 1.9250885613266768, 3.8571169557450111, 0.8375941911
5236877] != [0.0, 0.6100873664913984, 0.05965627862873103, 0.059656275796728137, 0.059656
27291921605, 0.05965627008539246, 0.059656267209699104, 0.05965626437769427, 0.0596562615
00180226, 0.05965625864542004]
Cleaning and resampling failed for user 8f1a6737-e8ee-4769-a1a4-da1948d886b6
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 72, in clean_and_resample
    last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 90, in save_cleaned_segments_for_ts
    return save_cleaned_segments_for_timeline(user_id, tl)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 116, in save_cleaned_segments_for_timeline
    (last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 878, in create_and_link_timeline
    link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 932, in link_trip_start
    ts, cleaned_trip_data, cleaned_start_place_data)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1055, in _fix_squished_place_mismatch
    assert False
AssertionError
shankari commented 8 years ago

This is the assertion failure.

2016-10-22 18:42:55,772:DEBUG:139739453028160:Returning entry with length 196 result
2016-10-22 18:42:55,837:ERROR:139739453028160:[0.0, 2.3263192593682924, 6.3678907209270443, 3.2600227545176255, 6.6239183628185057, 3.0126220946529756, 6.1001840526895883, 1.9250885613266768, 3.8571169557450111, 0.83759419115236877] != [0.0, 0.6100873664913984, 0.05965627862873103, 0.059656275796728137, 0.05965627291921605, 0.05965627008539246, 0.059656267209699104, 0.05965626437769427, 0.059656261500180226, 0.05965625864542004]

This is the cleaned section.

2016-10-22 18:42:55,
652:DEBUG:139739453028160:Saving entry Entry({u'data': Cleanedsection({u'distances': [0.0,
 18.30262099474195,
 1.789688358861931,
 1.789688273901844,
 1.7896881875764814,
 1.789688102561774,
 1.7896880162909732,
 1.7896879313308283,
 1.7896878450054068,
...
 8.837169242545398,
 12.591784376830297,
 12.591789512937877,
 12.591856281759517,
...
 12.59186141913691,
 12.59186655379083,
 12.591871690447377,
 12.591876825796776,
 12.591881961790241,
 3.1479848846183183,
 0.34332299842640407,
 0.3433230023982282,
 0.3433230063700522,
 0.021705260084170827],
 u'distance': 573.1045100900176,
 u'start_loc': {"coordinates": [-122.2588238866276, 37.87347466688405], "type": "Point"},
 u'end_ts': 1477160591.999968,
 u'start_ts': 1477154950.1033349,
 u'start_fmt_time': '2016-10-22T09:49:10.103335-07:00',
 u'end_loc': {u'type': u'Point', u'coordinates': [-122.2587968968994, 37.87360328718822]},
 u'sensed_mode': 7,
 u'source': u'SmoothedHighConfidenceMotion',
 u'end_fmt_time': u'2016-10-22T11:23:11.999968-07:00',
 u'duration': 603.1045100900176,
 u'trip_id': ObjectId('580bb32e88f663518b966b27'),
 u'speeds': [0.0,
 0.6100873664913984,
 0.05965627862873103,
 0.059656275796728137,
 0.05965627291921605,
 0.05965627008539246,
 0.059656267209699104,
 0.05965626437769427,
 0.059656261500180226,
 0.05965582759378062,
....
 0.05965582473712523,
 0.059655821861356734,
 0.05965581902927584,
 0.29457230808484663,
 0.41972614589434326,
 0.41972631709792924,
 0.41972648832298415,
....
 0.419728885126361,
 0.4197290563482459,
 0.4197292275265592,
 0.41972939872634135,
 0.10493282948727728,
 0.011444099947546802,
 0.01144410007994094,
 0.011444100212335074,
 0.011444100354803038]}),
 u'_id': ObjectId('580bb32f88f663518b966b29'),
 u'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'),
 u'key': u'analysis/cleaned_section'}}) into timeseries

The recreated cleaned section is associated with cleaned trip

2016-10-23 14:32:14,942:DEBUG:140735236476928:Considering trip 580d2c5cf6858f62a4e073f6: 2016-10-22T09:54:34.000022-07:00 -> 2016-10-22T11:23:11.999968-07:00 

which is

>>> edb.get_analysis_timeseries_db().find_one({"data.trip_id": ObjectId('580d2c5cf6858f62a4e073f6')})

{u'_id': ObjectId('580d2c5df6858f62a4e073f8'),
 u'data': {u'distance': 539.6090569753566,
  u'distances': [0.0,
   1.8099405858887094,
   1.8099404884951635,
   1.8099403910935563,
   1.8099402930296526,
   1.8099401969686582,
   1.8099400982343972,
   1.8099400015030453,
   1.8099399034390733,
   1.8099398073780093,
   1.8099397086436808,
...
   12.591870644909056,
   12.591875781560553,
   12.591880916904904,
   5.639887497279795,
   0.34332299761836116,
   0.34332300159018536,
   0.3433230062889957,
   0.09155218510787694],
u'duration': 5317.999946117401,
  u'end_fmt_time': u'2016-10-22T11:23:11.999968-07:00',
  u'end_loc': {u'coordinates': [-122.2587968968994, 37.87360328718822],
   u'type': u'Point'},
  u'end_local_dt': {u'day': 22,
   u'hour': 11,
   u'minute': 23,
   u'month': 10,
   u'second': 11,
   u'timezone': u'America/Los_Angeles',
   u'weekday': 5,
   u'year': 2016},
  u'end_ts': 1477160591.999968,
  u'sensed_mode': 7,
  u'source': u'SmoothedHighConfidenceMotion',
  u'speeds': [0.0,
   0.06033135286295698,
   0.06033134961650545,
   0.06033134636978521,
   0.06033134310098842,
   0.06033133989895527,
   0.06033133660781324,
....
   0.1879962499093265,
   0.011444099920612039,
   0.011444100053006178,
   0.01144410020963319,
   0.011444100218217313],
  u'start_fmt_time': u'2016-10-22T09:54:34.000022-07:00',
  u'start_loc': {u'coordinates': [-122.2588238866276, 37.87347466688405],
   u'type': u'Point'},
  u'start_local_dt': {u'day': 22,
   u'hour': 9,
   u'minute': 54,
   u'month': 10,
   u'second': 34,
   u'timezone': u'America/Los_Angeles',
   u'weekday': 5,
   u'year': 2016},
  u'start_ts': 1477155274.000022,
  u'trip_id': ObjectId('580d2c5cf6858f62a4e073f6')},

The speeds in the reconstructed data are different from the original. The original values seem to be off by a factor of 10 at least for the first entry. And the asserted values are different from all of them...

shankari commented 8 years ago

Hm, the two of them start at slightly different times.

u'start_fmt_time': '2016-10-22T09:49:10.103335-07:00',
u'start_fmt_time': '2016-10-22T09:54:34.000022-07:00',
shankari commented 8 years ago

In the original, after extrapolation, start_time was <Arrow [2016-10-22T09:49:40.100000-07:00]>. After squishing mismatch, inserted location at time fmt_time':'2016-10-22T09:49:10.103335-07:00' Update 'fmt_time': u'2016-10-22T09:49:40.103335-07:00'

Previous place was from

  u'enter_fmt_time': u'2016-10-22T09:46:40.103335-07:00',
  u'exit_fmt_time': u'2016-10-22T11:10:47.938500-07:00',

Given that we entered the place only at 9:46:40, retrieving locations before that could give us a point that was part of the last trip, and throw things off.

But why do we not use the fuzz factor? Because the prev enter_ts was actually 1477154800.1, and the reconstructed start_ts was 1477154980.1. Ah 9:49 versus 9:46

2016-10-22 18:42:54,640:DEBUG:139739453028160:Much extrapolation! new_start_ts 1477134472.78 < prev_enter_ts 1477154800.1

  u'enter_ts': 1477154800.103335,
  u'enter_fmt_time': u'2016-10-22T09:46:40.103335-07:00'
shankari commented 8 years ago

28 reconstructed locations while we are in the place.

In [172]: edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/recreated_location", 'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), 'data.ts': {'$lte': 1477154980.1, '$gte': 1477154800.103335}}).count()
Out[172]: 28

How about in the 30 sec gap?

u'fmt_time': '2016-10-22T09:49:10.103335-07:00', u'ts': 1477154950.1033349,
u'fmt_time': u'2016-10-22T09:49:40.103335-07:00', u'ts': 1477154980.103335,

47 points. That makes no sense. How can there be more points from 9:49:10 -> 9:49:40 than from 9:46:40 -> 9:49:40?

In [176]: edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/recreated_location", 'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), 'data.ts': {'$lte': 1477154980.103335, '$gte': 1477154950.1033349}}).count()
Out[176]: 47
shankari commented 8 years ago

That makes no sense. How can there be more points from 9:49:10 -> 9:49:40 than from 9:46:40 -> 9:49:40?

Because of the rounding. In the first query, we used 1477154980.1 and in the second, we used 1477154980.103335. Using 1477154980.103335 in the first query too, we get

In [177]: edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/recreated_location", 'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), 'data.ts': {'$lte': 1477154980.103335, '$gte': 1477154800.103335}}).count()
Out[177]: 52

This means that there are 52 - 28 = 24 entries between 1477154980.1 and 1477154980.103335?!

In [178]: edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/recreated_location", 'user_id': UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'), 'data.ts': {'$lte': 1477154980.103335, '$gte': 1477154980.1}}).count()
Out[178]: 23

This is just the re-creation over and over again.

[{u'data': {u'fmt_time': u'2016-10-22T09:49:40.103335-07:00',
   u'ts': 1477154980.103335}},
 {u'data': {u'fmt_time': u'2016-10-22T09:49:40.103335-07:00',
   u'ts': 1477154980.103335}},
 {u'data': {u'fmt_time': u'2016-10-22T09:49:40.103335-07:00',
   u'ts': 1477154980.103335}},
 {u'data': {u'fmt_time': u'2016-10-22T09:49:40.103335-07:00',
   u'ts': 1477154980.103335}},
 {u'data': {u'fmt_time': u'2016-10-22T09:49:40.103335-07:00',
   u'ts': 1477154980.103335}},
 {u'data': {u'fmt_time': u'2016-10-22T09:49:40.103335-07:00',
   u'ts': 1477154980.103335}},

At this point, I can't figure out what's wrong, I can't reproduce it, and there's nothing obviously wrong with the logging. I'm going to reset the pipeline and re-run to see what happens.

shankari commented 8 years ago

Deleting and re-creating worked!

ubuntu@ip-10-203-173-119:/mnt/e-mission/e-mission-server$ ./e-mission-py.bash bin/debug/intake_single_user.py -e yinmogeng@gmail.com
/home/ubuntu/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
2016-10-24T04:06:45.169799+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: moving to long term**********
2016-10-24T04:06:47.531777+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: filter accuracy if needed**********
2016-10-24T04:06:47.539740+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: segmenting into trips**********
2016-10-24T04:06:56.635775+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: segmenting into sections**********
2016-10-24T04:07:00.596297+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: smoothing sections**********
/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/location_smoothing.py:265: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  sorted_filled_df = filled_df.sort("ts").reset_index()
2016-10-24T04:07:06.816403+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: cleaning and resampling timeline**********
/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/frame.py:839: UserWarning: DataFrame columns are not unique, some columns will be omitted.
  "columns will be omitted.", UserWarning)
2016-10-24T04:07:21.971110+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: checking active mode trips to autocheck habits**********
2016-10-24T04:07:21.973555+00:00**********UUID 8f1a6737-e8ee-4769-a1a4-da1948d886b6: storing views to cache**********
No user defined overrides for 8f1a6737-e8ee-4769-a1a4-da1948d886b6, early return
No user defined overrides for 8f1a6737-e8ee-4769-a1a4-da1948d886b6, early return
shankari commented 8 years ago

I have no idea what happened, and I've already spent a day on this instead of getting the stats to work. So I'm going to close this and move on, since it has fixed everything... :)

shankari commented 7 years ago

Ran into this again for two different users. This has been happening for a long time, so I am not sure whether the logs still exist. I am going to check whether they have mixed entries. Nope, they don't.

In [234]: edb.get_timeseries_db().find({"user_id": UUID(<user_id_1>)}).distinct("metadata.platform")
Out[234]: [u'server', u'ios']

In [236]: edb.get_timeseries_db().find({"user_id": UUID("fd7b4c2e-2c8b-3bfa-94f0-d1e3ecbd5fb7")}).distinct("metadata.platform")
Out[236]: [u'server', u'android']
shankari commented 7 years ago

This has been happening for so long that it is going to be hard to track down the root cause. Resetting the pipeline and re-running again. We also need to keep a careful eye on the errors and investigate more quickly next time.

PatGendre commented 5 years ago

Hi @shankari, I don't if it is exactly the same issue but it looked like very similar.

The pipeline for @DO3B data broke at the cleaning and resampling stage several days ago, the error is the following :

2019-07-09 07:47:36,605:DEBUG:140484490442496:Found 2 sections, need to remove the uncommon ones...
2019-07-09 07:47:36,606:DEBUG:140484490442496:section counts = [(ObjectId('5d242a694e276da46fe56218'), 168), (ObjectId('5d242a694e276da46fe562c2'), 164)]
2019-07-09 07:47:36,607:ERROR:140484490442496:Cleaning and resampling failed for user 2775ba42-1f3f-4b4c-b026-8ffe18e7c0e8
Traceback (most recent call last):
  File "/var/emission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 84, in clean_and_resample
    last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
  File "/var/emission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 110, in save_cleaned_segments_for_ts
    return save_cleaned_segments_for_timeline(user_id, tl)
  File "/var/emission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 136, in save_cleaned_segments_for_timeline
    (last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
  File "/var/emission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 962, in create_and_link_timeline
    link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
  File "/var/emission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1016, in link_trip_start
    ts, cleaned_trip_data, cleaned_start_place_data)
  File "/var/emission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1157, in _fix_squished_place_mismatch
    assert(min(section_counts) == 1)
AssertionError

Looking at line 1157 , the comments just above say This code should work even if this assert is removed but this is the expectation we have based on the use cases we have seen so far. So I commented out the line, and then the clean and resample stage passed, so @DO3B has again his data available in the analysis collection.

However, this is maybe not a long term solution, I suppose the assert line was there for a good reason?

shankari commented 5 years ago

@PatGendre there are a bunch of asserts in the pipeline which deal primarily with how I expect it to work. I added them so that if they failed, I could go back, investigate further, and either adjust my expectation or adjust the code. Unfortunately, I don't have the time to investigate right now and I don't think you understand the pipeline well enough to investigate.

So in this case, the best option is indeed to comment out the assert. Maybe when I have time in the future, we can re-enable the assert and I can investigate it if the problem recurs.

PatGendre commented 5 years ago

Thanks, I leave the comment for the moment as this solves the problem for Loic's data @DO3B

PatGendre commented 5 years ago

@shankari @DO3B Hi, there was another assert Error in the clean and resample stage of the intake pipeline for loic's data, in line 1204 and 1210 in clean_and_resample.py. So I commented the lines and the clean and resample has been terminated. Then I removed the comment again the lines and also for line 1155, so that we can see in the next days if the error appears again or if it was transient...

PatGendre commented 5 years ago

@DO3B for your info the clean and resample pipeline stage produced no error last night, although I un-commented the assert, so the assert errors were due to particular data values (so could be investigated another time if this error re-appears).

shankari commented 5 years ago

@PatGendre in general, unless you are actively working on the pipeline, it should be safe to comment out any asserts that you encounter. They are really there to help pipeline developers notice when something doesn't match assumptions so that they can go and investigate further.

PatGendre commented 5 years ago

@shankari Ok, I leave the asserts but next time if comment out an assert, I will leave the comment.