Tracking issue for everything that we encounter during the great pipeline reset

e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.

https://e-mission.readthedocs.io/en/latest

BSD 3-Clause "New" or "Revised" License

15 stars 34 forks source link

Tracking issue for everything that we encounter during the great pipeline reset #243

Closed shankari closed 1 year ago

shankari commented 8 years ago

Conficts in a bunch of files.

Checked out

bin/intake_stage.py
emission/net/api/cfc_webapp.py
emission/net/ext_service/habitica/proxy.py

Copied over

conf/net/int_service/giles_conf.json
conf/net/keys.json

Created habitica config and copied values from keys.

Webserver starts up fine.

Time to restart the pipeline now.

shankari commented 8 years ago

Deleted all analysis data.

ubuntu@ip-10-203-173-119:/mnt/e-mission/e-mission-server$ ./e-mission-py.bash bin/reset_analysis_database.py
ERROR:root:nominatim not configured either, place decoding must happen on the client
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/core/our_geocoder.py", line 15, in <module>
    nominatim_file = open("conf/net/ext_service/nominatim.json")
IOError: [Errno 2] No such file or directory: 'conf/net/ext_service/nominatim.json'
{u'ok': 1, u'n': 1420137}
{u'ok': 1, u'n': 561}
{u'ok': 1, u'n': 811}
{u'ok': 1, u'n': 539}

Reset all habitica users

ubuntu@ip-10-203-173-119:/mnt/e-mission/e-mission-server$ ./e-mission-py.bash bin/ext_service/reset_all_habitica_users.py
DEBUG:root:Processing emission user id 74f2a65f-ef5a-3648-8239-23d24f245f49
DEBUG:root:For user 74f2a65f-ef5a-3648-8239-23d24f245f49, about to proxy POST method /api/v3/user/reset with args {}
DEBUG:root:auth_headers = {'x-api-user': u'e26ef1c3-d45e-464a-998c-729d107ecc42', 'x-api-key': u'5669eb83-3e36-468d-95f4-6af15801d6cc'}
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 54.159.38.241
DEBUG:requests.packages.urllib3.connectionpool:"POST /api/v3/user/reset HTTP/1.1" 200 N
oneDEBUG:root:result = <Response [200]>
DEBUG:root:reset result for 789a6ea5-7d20-3bf7-8127-b7756ffe46eb = <Response [200]>DEBUG:root:Processing emission user id 0763de67-f61e-3f5d-90e7-518e69793954
...
38.241
DEBUG:requests.packages.urllib3.connectionpool:"POST /api/v3/user/reset HTTP/1.1" 200 None
DEBUG:root:result = <Response [200]>
DEBUG:root:reset result for e471711e-bd14-3dbe-80b6-9c7d92ecc296 = <Response [200]>

shankari commented 8 years ago

Now, copy over all pending usercache stuff

shankari commented 8 years ago

First, reconfigure intake.conf from intake.conf.sample to send logs to our log partition instead of /var/tmp/. Delete all existing intake logs, and change our backup count to 6 since we plan to run two pipelines in parallel.

shankari commented 8 years ago

Error while running the formatters

Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 70, in moveToLongTerm
    unified_entry = enuf.convert_to_common_format(entry)
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/formatters/formatter.py", line 6, in convert_to_common_format
    return format_fn(entry)
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/formatters/ios/transition.py", line 55, in format
    data.transition = transition_map[entry.data.transition].value
KeyError: None
Got error None while saving entry AttrDict({u'user_id': UUID('0237572d-9cb8-3c30-96bb-8005f6106161'), u'_id': ObjectId('57ded24e0a3eb8b13019ce81'), u'data': {u'transition': None, u'currState': u'STATE_ONGOING_TRIP'}, u'metadata': {u'plugin': u'none', u'write_ts': 1474185580.054358, u'time_zone': u'America/Los_Angeles', u'platform': u'ios', u'key': u'statemachine/transition', u'read_ts': 0, u'type': u'message'}}) -> None

shankari commented 8 years ago

Seems to happen consistently even if not particularly frequently on iOS.

Got error None while saving entry AttrDict({u'user_id': UUID('04915299-f90b-3d22-9c4e-349267af5130'), u'_id': ObjectId('5769ff494a9bba50e3fb6620'), u'data': {u'transition': None, u'currState': u'STATE_ONGOING_TRIP'}, u'metadata': {u'plugin': u'none', u'write_ts': 1466562923.042579, u'time_zone': u'America/Los_Angeles', u'platform': u'ios', u'key': u'statemachine/transition', u'read_ts': 0, u'type': u'message'}}) -> None
...
Got error None while saving entry AttrDict({u'user_id': UUID('04915299-f90b-3d22-9c4e-349267af5130'), u'_id': ObjectId('576c50204a9bba50e3fc0725'), u'data': {u'transition': None, u'currState': u'STATE_ONGOING_TRIP'}, u'metadata': {u'plugin': u'none', u'write_ts': 1466714035.526686, u'time_zone': u'America/Los_Angeles', u'platform': u'ios', u'key': u'statemachine/transition', u'read_ts': 0, u'type': u'message'}}) -> None

shankari commented 8 years ago

Hit the error 184 times.

 /mnt/e-mission/e-mission-server$ bunzip2 -c /mnt/logs/emission/intake-errors.log.gz | grep "KeyError" | wc -l
184

shankari commented 8 years ago

Now, moving on to running the two copies of the actual pipeline

shankari commented 8 years ago

Second pipeline (which had more recent users with little history) is done. First pipeline is still processing my data. This is why static scheduling is bad!

shankari commented 8 years ago

Also, the habitica reset didn't actually work. We did reset the users, but we didn't reset the timestamp in the database, so we only restored points from after the last time that the pipeline ran. That is so sad.

2016-09-19 14:17:49,146:DEBUG:139656908773184:curr_query = {'$or': [{'metadata.key': 'analysis/cleaned_section'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.start_ts': {'$lte': 1474294669, '$gte': 1474105338}}, sort_key = data.start_ts
2016-09-19 14:17:49,146:DEBUG:139656908773184:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/cleaned_section']
2016-09-19 14:17:49,147:DEBUG:139656908773184:finished querying values for []
2016-09-19 14:17:49,147:DEBUG:139656908773184:finished querying values for ['analysis/cleaned_section']
2016-09-19 14:17:49,149:DEBUG:139656908773184:Found 0 results
2016-09-19 14:17:49,149:DEBUG:139656908773184:Returning entry with length 0 result
2016-09-19 14:17:49,149:INFO:139656908773184:Found no entries for user 789a6ea5-7d20-3bf7-8127-b7756ffe46eb, time_query <emission.storage.timeseries.timequery.TimeQuery object at 0x7f0468677990>
2016-09-19 14:17:49,150:DEBUG:139656908773184:Metrics response: {'last_ts_processed': None, 'result': []}

Sent out https://github.com/e-mission/e-mission-server/pull/409 for the fix - can re-run it to make sure the points are correct

shankari commented 8 years ago

Crashed with

Traceback (most recent call last):
  File "bin/historical/before_bic2cal/intake_stage_1.py", line 46, in <module>
    eaist.segment_current_trips(uuid)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation.py", line 74, in segment_current_trips
    filter_methods)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation.py", line 108, in get_combined_segmentation_points
    time_query.startTs = loc_df[loc_df["filter"] == curr_filter].head(1).iloc[0].ts
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1286, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1573, in _getitem_axis
    self._is_valid_integer(key, axis)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1487, in _is_valid_integer
    raise IndexError("single positional indexer is out-of-bounds")

This was because

2016-09-20 05:58:52,278:DEBUG:140114953025344:Filters in the dataframe = [u'time' nan]

Added a dropna() hotfix.

Need to start pipeline again, and this time, I'm manually going to set the UUID list to try and balance the length of time.

shankari commented 8 years ago

Failed for user UUID('34b64c82-2fe8-36bd-8ac3-55496692e215'). First user in pipeline 2 = UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb')

Full intake_1 first run list =

[UUID('0763de67-f61e-3f5d-90e7-518e69793954'),
 UUID('f955cff1-8fb8-3b42-8776-6d8874e5b90a'),
 UUID('b0d937d0-70ef-305e-9563-440369012b39'),
 UUID('6a488797-5be1-38aa-9700-5532be978ff9'),
 UUID('34b64c82-2fe8-36bd-8ac3-55496692e215'),
 None,
 UUID('993f14f0-79f0-33fa-bdbc-049b88056fbf'),
 UUID('e5edfe4c-dc67-3ef4-8d3a-7042ce3c1a13'),
 UUID('8184b253-df6f-3f0b-aac4-54d65a368110'),
 UUID('788f46af-9e6d-300b-93e1-981ba9b3390b'),
 UUID('fa5c7f7c-3ab0-314b-989f-5042ab5cd946'),
 UUID('78cf23c1-04b4-370a-801f-a86d6d7701b3'),
 UUID('87b2a145-69e3-372b-8a66-801051fb11d1'),
 UUID('fe08f85c-a49b-3ed4-b2d2-fa7d8c055713'),
 UUID('5679434e-f352-31ea-a3cb-aa0cbda8304b'),
 UUID('273efe85-937e-3622-9b34-19cb64653a9f'),
 UUID('36901462-4987-3425-b263-e9bb6c0bd516'),
 UUID('04915299-f90b-3d22-9c4e-349267af5130'),
 UUID('f6ed12b8-ba13-3cfe-83af-d018dfd32f19'),
 UUID('f7711fa2-d2bd-374e-b377-ef2ecc9afa60'),
 UUID('4754956b-e415-318a-b586-a8c95751ed7b'),
 UUID('d6ee41de-5796-357c-a7d1-c226e35e104d'),
 UUID('9a9883f4-bde7-3ce5-beec-e1d6b8bf8f56'),
 UUID('e0509a4e-9842-3c73-8050-875b6383c097'),
 UUID('1a0bd056-283c-370e-8349-35aa07d936ca'),
 UUID('74f2a65f-ef5a-3648-8239-23d24f245f49'),
 UUID('ef015d3d-50de-3b9e-b9a5-a40554ecb18b'),
 UUID('06f0779c-7957-3631-9ee5-7d59487c7758'),
 UUID('d635a545-b44e-33f0-89ab-57d37359dc69'),
 UUID('c39c8281-c4c0-3504-b83c-b797dd0c4f13')]

Let's split it up like so:

intake1

[UUID('34b64c82-2fe8-36bd-8ac3-55496692e215'),
 UUID('e5edfe4c-dc67-3ef4-8d3a-7042ce3c1a13'),
 UUID('788f46af-9e6d-300b-93e1-981ba9b3390b'),
 UUID('78cf23c1-04b4-370a-801f-a86d6d7701b3'),
 UUID('fe08f85c-a49b-3ed4-b2d2-fa7d8c055713'),
 UUID('273efe85-937e-3622-9b34-19cb64653a9f'),
 UUID('04915299-f90b-3d22-9c4e-349267af5130'),
 UUID('f7711fa2-d2bd-374e-b377-ef2ecc9afa60'),
 UUID('9a9883f4-bde7-3ce5-beec-e1d6b8bf8f56'),
 UUID('e0509a4e-9842-3c73-8050-875b6383c097'),
 UUID('1a0bd056-283c-370e-8349-35aa07d936ca'),
 UUID('d635a545-b44e-33f0-89ab-57d37359dc69')]

intake2

[UUID('993f14f0-79f0-33fa-bdbc-049b88056fbf'),
 UUID('8184b253-df6f-3f0b-aac4-54d65a368110'),
 UUID('fa5c7f7c-3ab0-314b-989f-5042ab5cd946'),
 UUID('87b2a145-69e3-372b-8a66-801051fb11d1'),
 UUID('5679434e-f352-31ea-a3cb-aa0cbda8304b'),
 UUID('36901462-4987-3425-b263-e9bb6c0bd516'),
 UUID('f6ed12b8-ba13-3cfe-83af-d018dfd32f19'),
 UUID('4754956b-e415-318a-b586-a8c95751ed7b'),
 UUID('d6ee41de-5796-357c-a7d1-c226e35e104d'),
 UUID('74f2a65f-ef5a-3648-8239-23d24f245f49'),
 UUID('ef015d3d-50de-3b9e-b9a5-a40554ecb18b'),
 UUID('06f0779c-7957-3631-9ee5-7d59487c7758'),
 UUID('c39c8281-c4c0-3504-b83c-b797dd0c4f13')]

shankari commented 8 years ago

The old pipeline finished running at around noon today

-rw-rw-r-- 1 ubuntu ubuntu 780M Sep 20 19:12 intake_2.log.gz
ubuntu@ip-10-203-173-119:/mnt/e-mission/e-mission-server$ date
Tue Sep 20 23:41:10 UTC 2016

and I've merged a ton of scalability/robustness fixes in the past two days (#409, #411, #413, #414, #415, #416).

Once I pull all these to the server, I need to start doing some more fixup.

reset UUIDs (farewell my beloved 0763...)
reset habitica stats
re-run the pipeline to pick up stuff from the past two days
re-enable the cron job

Babysit to see how it works!

shankari commented 8 years ago

So the steps are:

Turn off webserver

$ kill -9 (supervisord)
$ kill -9 (cfc_webapp.py)

Pull new stuff

$ git pull origin master

Fixed edited files after making changes locally

$ git checkout emission/analysis/intake/segmentation/trip_segmentation.py
$ git checkout bin/debug/load_timeline_for_day_and_user.py

Pulled again

$ git pull origin master

Configured webserver logs

$ cp conf/log/webserver.conf.sample conf/log/webserver.conf
< edit to point to correct path >
< edit intake pipeline to store 5 backup files >

Now, while the webserver is still down and we cannot get any more incoming messages, reset the uuids

$ ./e-mission-py.bash bin/historical/before_bic2cal/reset_uuids.py 2>&1 | tee /path/to/uuid_reset.log

....
DEBUG:root:{'updatedExisting': False, u'nModified': 0, u'ok': 1, u'n': 0}
DEBUG:root:Resetting edb.get_timeseries_error_db...
DEBUG:root:{'updatedExisting': False, u'nModified': 0, u'ok': 1, u'n': 0}
DEBUG:root:Resetting edb.get_usercache_db...
DEBUG:root:{'updatedExisting': True, u'nModified': 6104, u'ok': 1, u'n': 6104}

Next, reset habitica users. this will provide some initial verification that the uuid reset worked. I currently have 58 points on habitica.

First, plain reset to ensure that it works.

$ ./e-mission-py.bash bin/ext_service/reset_all_habitica_users.py 2>&1 | tee /path/to/reset_habitica_1.log

Then reset with restore

$ ./e-mission-py.bash --restore bin/ext_service/reset_all_habitica_users.py 2>&1 | tee /path/to/reset_habitica_1.log

Worked! I am now at level 33!

shankari commented 8 years ago

Almost done, don't screw up now! Next, we want to turn on the webserver again.

$ /home/ubuntu/anaconda/bin/supervisord
$ ps -aef | grep python 
/home/ubuntu/anaconda/bin/python /home/ubuntu/anaconda/bin/supervisord
/home/ubuntu/anaconda/bin/python emission/net/api/cfc_webapp.py

Check that the logs are working. Oops wrong path configured.

root     12247 12246  1 01:04 ?        00:00:01 /home/ubuntu/anaconda/bin/python emission/net/api/cfc_webapp.py
$ sudo kill -9 12247
root     12414 12413 94 01:07 ?        00:00:00 /home/ubuntu/anaconda/bin/python emission/net/api/cfc_webapp.py

Now logs are in the right location.

2016-09-21 01:07:19,353:DEBUG:139967128368960:This should go to the log file

Now, at long last, restart the intake pipeline to process data for the past two days. And then.... time to turn it back on forever...

shankari commented 8 years ago

Several issues while running the pipeline.

First, there are several people who have no entries in the timeseries database. Previously, we used to filter them out because we only looked at people who were in the long term database, but now that we look at everybody, that doesn't work any more.

In [279]: edb.get_timeseries_db().find({"user_id": UUID('eede2768-df3f-4f59-9555-445d9aee7d70')}).count()
Out[279]: 0

We should really delete such people, but I am currently really wary of deleting anything. Instead, I will add a check to the intake pipeline to weed them out.

shankari commented 8 years ago

Second, all three processes have crashed in the USERCACHE stage. Unclear why - need to investigate.

[{u'_id': ObjectId('57dfd58588f663018ccbdc77'),
  u'curr_run_ts': 1474421642.952858,
  u'last_processed_ts': 1458230276.237236,
  u'last_ts_run': 1474286976.366866,
  u'pipeline_stage': 0,
  u'user_id': UUID('e9cd5530-719d-4ca3-992e-ff35636e06e8')},
 {u'_id': ObjectId('57dfd8ff88f663018ccbdcbe'),
  u'curr_run_ts': 1474420687.125368,
  u'last_processed_ts': 1462576429.402939,
  u'last_ts_run': 1474287866.051583,
  u'pipeline_stage': 0,
  u'user_id': UUID('7aab862f-fa38-4484-909d-1da0a95f594d')},
 {u'_id': ObjectId('57e1e21988f66330f265846d'),
  u'curr_run_ts': 1474421268.311108,
  u'last_processed_ts': None,
  u'last_ts_run': None,
  u'pipeline_stage': 0,
  u'user_id': UUID('99b29da2-989d-4e54-a211-284bde1d362d')}]

At least one of the crashes is

  File "/home/ubuntu/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run  
    self._target(*self._args, **self._kwargs)
  File "/mnt/e-mission/e-mission-server/emission/pipeline/intake_stage.py", line 52, in run_intake_pipeline
    uh.moveToLongTerm()
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 85, in moveToLongTerm
    ts.bulk_insert(unified_entry_list, etsa.EntryType.DATA_TYPE)
  File "/mnt/e-mission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py", line 269, in bulk_insert
    ts_enum_map[data_type].insert(entries)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pymongo/collection.py", line 410, in insert
    _check_write_command_response(results)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pymongo/helpers.py", line 202, in _check_write_command_response
    raise DuplicateKeyError(error.get("errmsg"), 11000, error)
DuplicateKeyError: insertDocument :: caused by :: 11000 E11000 duplicate key error index: Stage_database.Stage_timeseries.$_id_  dup key: { : ObjectId('56fe93e91268798022708b00') }

shankari commented 8 years ago

We used to get DuplicateErrors before as well - handled them with try/catch. Now the insert it out of the try/catch. We need to use continue_on_error=True http://stackoverflow.com/questions/10388540/insert-or-ignore-multiple-documents-in-mongodb/12268247

shankari commented 8 years ago

wrt https://github.com/e-mission/e-mission-server/issues/407#issuecomment-248486809, this is the actual error. We should really make this more robust as well.

2016-09-21 01:27:53,125:ERROR:140282532489024:Cleaning and resampling failed for user 1c8a8834-60d3-4dac-8e35-e97c73d434ca
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 72, in clean_and_resample
    last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 90, in save_cleaned_segments_for_ts
    return save_cleaned_segments_for_timeline(user_id, tl)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 116, in save_cleaned_segments_for_timeline
    (last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 779, in create_and_link_timeline
    curr_cleaned_start_place = get_filtered_place(tl.first_place())
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 208, in get_filtered_place
    _copy_non_excluded(old_data=raw_place.data,
AttributeError: 'NoneType' object has no attribute 'data'

shankari commented 8 years ago

Found error

Found error  while processing trip Entry({u'data': {u'distance': 1161.6474633112616, u'end_place': ObjectId('57e01dd688f66307854b681e'), u'raw_trip': ObjectId('57e01dc888f66307854b669e'), u'start_loc': {u'type': u'Point', u'coordinates': <redacted>}, u'end_ts': 1473664937.272, u'start_ts': 1473664226.7945209, u'start_fmt_time': u'2016-09-12T09:10:26.794521', u'end_loc': {u'type': u'Point', u'coordinates': <redacted>}, u'source': u'DwellSegmentationTimeFilter', u'start_place': ObjectId('57e01dd688f66307854b681d'), u'end_fmt_time': u'2016-09-12T09:22:17.272000', u'end_local_dt': {u'hour': 9, u'month': 9, u'second': 17, u'weekday': 0, u'year': 2016, u'timezone': <redacted>, u'day': 12, u'minute': 22}, u'duration': 710.4774792194366, u'start_local_dt': {u'hour': 9, u'month': 9, u'second': 26, u'weekday': 0, u'year': 2016, u'timezone': u'', u'day': 12, u'minute': 10}}, u'_id': ObjectId('57e01dd588f66307854b67db'), u'user_id': UUID(<redacted>), u'metadata': {u'write_fmt_time': u'2016-09-19T10:18:13.132269-07:00', u'write_ts': 1474305493.132269, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 10, u'month': 9, u'second': 13, u'weekday': 0, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 19, u'minute': 18}, u'key': u'analysis/cleaned_trip'}})
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 221, in get_geojson_for_timeline
    trip_geojson = trip_to_geojson(trip, tl)
  File "/mnt/e-mission/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 184, in trip_to_geojson
    section_gj = section_to_geojson(section, tl)
  File "/mnt/e-mission/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 117, in section_to_geojson
    assert(False)
AssertionError

shankari commented 8 years ago

Traceback (most recent call last):
  File "/home/ubuntu/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run  
    self._target(*self._args, **self._kwargs)
  File "/mnt/e-mission/e-mission-server/emission/pipeline/intake_stage.py", line 58, in run_intake_pipeline
    uh.moveToLongTerm()
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 85, in moveToLongTerm
    ts.bulk_insert(unified_entry_list, etsa.EntryType.DATA_TYPE, continue_on_error=True)
TypeError: bulk_insert() got an unexpected keyword argument 'continue_on_error'

shankari commented 8 years ago

Found error single positional indexer is out-of-bounds while processing trip Entry({u'data': {u'distance': 4.959556067689948, u'end_place': ObjectId('57e00f3288f66307854b2542'), u'start_loc': {u'type': u'Point', u'coordinates': }, u'end_ts': 1471814458.878665, u'start_ts': 1471813523.073008, u'start_fmt_time': u'2016-08-21T18:05:23.073008', u'end_loc': {u'type': u'Point', u'coordinates': [-46.47651544614448, -23.42357263594097]}, u'source': u'DwellSegmentationDistFilter', u'start_place': ObjectId('57e00f3288f66307854b2540'), u'end_fmt_time': u'2016-08-21T18:20:58.878665', u'end_local_dt': {u'hour': 18, u'month': 8, u'second': 58, u'weekday': 6, u'year': 2016, u'timezone': u'', u'day': 21, u'minute': 20}, u'duration': 935.8056569099426, u'start_local_dt': {u'hour': 18, u'month': 8, u'second': 23, u'weekday': 6, u'year': 2016, u'timezone': u'', u'day': 21, u'minute': 5}}, u'_id': ObjectId('57e00f3288f66307854b2541'), u'user_id': UUID(''), u'metadata': {u'write_fmt_time': u'2016-09-19T09:15:46.486968-07:00', u'write_ts': 1474301746.486968, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 9, u'month': 9, u'second': 46, u'weekday': 0, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 19, u'minute': 15}, u'key': u'segmentation/raw_trip'}})

  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 101, in save_cleaned_segments_for_timeline
    filtered_trip = get_filtered_trip(ts, trip)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 166, in get_filtered_trip
    section_map[section.get_id()] = get_filtered_section(filtered_trip_entry, section)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 255, in get_filtered_section
    with_speeds_df = get_filtered_points(section, filtered_section_data)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 337, in get_filtered_points
    filtered_loc_df = _add_start_point(filtered_loc_df, raw_start_place, ts)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 439, in _add_start_point
    curr_first_point = filtered_loc_df.iloc[0]
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1286, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1573, in _getitem_axis
    self._is_valid_integer(key, axis)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1487, in _is_valid_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

shankari commented 8 years ago

Found error single positional indexer is out-of-bounds while processing trip Entry({u'data': {u'distance': 605.5213502554404, u'end_place': ObjectId('57e16c3a88f6631d412f5d78'), u'start_loc': {u'type': u'Point', u'coordinates':}, u'end_ts': 1466656863, u'start_ts': 1466651357, u'start_fmt_time': u'2016-06-22T20:09:17', u'end_loc': {u'type': u'Point', u'coordinates':}, u'source': u'DwellSegmentationTimeFilter', u'start_place': ObjectId('57e16c3a88f6631d412f5d76'), u'end_fmt_time': u'2016-06-22T21:41:03-07:00', u'end_local_dt': {u'hour': 21, u'month': 6, u'second': 3, u'weekday': 2, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 41}, u'duration': 5506, u'start_local_dt': {u'hour': 20, u'month': 6, u'second': 17, u'weekday': 2, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 9}}, u'_id': ObjectId('57e16c3a88f6631d412f5d77'), u'user_id': UUID(''), u'metadata': {u'write_fmt_time': u'2016-09-20T10:04:58.877571-07:00', u'write_ts': 1474391098.877571, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 10, u'month': 9, u'second': 58, u'weekday': 1, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 20, u'minute': 4}, u'key': u'segmentation/raw_trip'}})
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 101, in save_cleaned_segments_for_timeline
    filtered_trip = get_filtered_trip(ts, trip)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 166, in get_filtered_trip
    section_map[section.get_id()] = get_filtered_section(filtered_trip_entry, section)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 255, in get_filtered_section
    with_speeds_df = get_filtered_points(section, filtered_section_data)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 365, in get_filtered_points
    resampled_loc_df = resample(filtered_loc_df, interval=30)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 680, in resample
    tz_new = [_get_timezone(ts, tz_ranges_df) for ts in ts_new]
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 698, in _get_timezone
    return sel_entry.timezone.iloc[0]
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1286, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1573, in _getitem_axis
    self._is_valid_integer(key, axis)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1487, in _is_valid_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

shankari commented 8 years ago

wrt https://github.com/e-mission/e-mission-server/issues/407#issuecomment-248493235, moved it to the timeseries.

shankari commented 8 years ago

wrt https://github.com/e-mission/e-mission-server/issues/407#issuecomment-248494724, we have a check for loc_df at the beginning of resample

def resample(filtered_loc_df, interval):
    if len(loc_df) == 0 or len(loc_df) == 1:
        logging.debug("len(loc_df) = %s, early return" % (len(loc_df)))
        return loc_df

we should be able to handle missing points in add_first_point etc. and we can extrapolate to the end of the last trip. The only question is - what should use as the time if we have no points to determine speed from?

but not at the beginning of _add_start_point

shankari commented 8 years ago

2016-09-21 01:18:46,479:DEBUG:140282532489024:keys = (segmentation/raw_stop, segmentation/raw_section), len(places) = 0, len(trips) = 1
...
2016-09-21 01:18:46,494:DEBUG:140282532489024:deleting 5 points from section points
2016-09-21 01:18:46,498:DEBUG:140282532489024:Found 5 results
...
2016-09-21 01:18:46,503:DEBUG:140282532489024:After filtering, rows went from 5 -> 0, cols from 23 -> 22
...
2016-09-21 01:18:46,507:DEBUG:140282532489024:Raw place is 57e00f3288f66307854b2540 and corresponding location is 57bb92410a3eb8b1300477c3
2016-09-21 01:18:46,507:ERROR:140282532489024:Found error single positional indexer is out-of-bounds

What does it even mean to delete every single point? That's a zero length section, that is. But we don't have the concept of squishing sections, only trips and the squished trips gave us enough headaches.

shankari commented 8 years ago

In general, if we try to do something very fancy, we will fail. We need to find something simple. None thing to note is that the associated section is very mysterious. Although it shows up in intake log 0, and there is only one instance of that log, the first time that it shows up is in the cleaning stage.

Shouldn't it have shown up in the section segmentation stage?

shankari commented 8 years ago

$ bunzip2 -c intake_0.log.gz | grep 57e00f4288f66307854b25f6
2016-09-21 01:18:46,480:DEBUG:140282532489024:trips_or_sections = [ObjectId('57e00f4288f66307854b25f6')]

$ ubuntu@ip-10-203-173-119:/mnt/logs/emission$ ls *intake_0*
intake_0-errors.log.gz  intake_0.log.gz

Yup, this time, SECTION_SEGMENTATION didn't do anything because it had already run. Basically, we got a bunch of errors in CLEAN_RESAMPLE earlier and now we are stuck because we are trying to reuse bad values. The good news is that we probably don't need to resegment, since it either the usercache or the CLEAN_RESAMPLE that is messed up.

2016-09-21 01:18:14,125:INFO:140282532489024:For stage PipelineStages.SECTION_SEGMENTATION, start_ts = 2016-09-16T20:28:45.362935
2016-09-21 01:18:14,126:DEBUG:140282532489024:for user 202c2eca-b079-43e3-aeec-4a1562ebefa9, returning timeseries <emission.storage.timeseries.builtin_timeseries.BuiltinTimeSeries object at 0x7f95f1e0f710>
2016-09-21 01:18:14,126:DEBUG:140282532489024:curr_query = {'$or': [{'metadata.key': 'segmentation/raw_trip'}], 'data.end_ts': {'$lte': 1474420689.125559, '$gte': 1474057725.362935}, 'user_id': UUID('202c2eca-b079-43e3-aeec-4a1562ebefa9')}, sort_key = data.end_ts
2016-09-21 01:18:14,126:DEBUG:140282532489024:orig_ts_db_keys = [], analysis_ts_db_keys = ['segmentation/raw_trip']
2016-09-21 01:18:14,126:DEBUG:140282532489024:finished querying values for []
2016-09-21 01:18:14,127:DEBUG:140282532489024:finished querying values for ['segmentation/raw_trip']
2016-09-21 01:18:14,392:DEBUG:140282532489024:Returning entry with length 0 result
2016-09-21 01:18:14,393:INFO:140282532489024:For stage PipelineStages.SECTION_SEGMENTATION, last_ts_processed is unchanged

Let's see what the outcome of the investigation into the big walk sections is.

shankari commented 8 years ago

There might have been a similar crash earlier, let's grep the prior logs.

shankari commented 8 years ago

wrt https://github.com/e-mission/e-mission-server/issues/407#issuecomment-248480746, that seems pretty high. Looking back at the metrics, I found several days with suspicious trips.

ModeStatTimeSummary({'nUsers': 1, 'ON_FOOT': 51625.38093002352, 'local_dt': LocalDate({'hour': 21, 'month': 9, 'second': 16, 'weekday': 2, 'year': 2015, 'timezone': 'UTC', 'day': 30, 'minute': 33}), 'ts': 1443648796, 'fmt_time': '2015-09-30T21:33:16.211080+00:00'}),
ModeStatTimeSummary({'nUsers': 1, 'ON_FOOT': 67935.68624558138, 'local_dt': LocalDate({'hour': 1, 'month': 12, 'second': 19, 'weekday': 6, 'year': 2015, 'timezone': 'UTC', 'day': 20, 'minute': 1}), 'ts': 1450573279, 'fmt_time': '2015-12-20T01:01:19.222216+00:00'}),
ModeStatTimeSummary({'nUsers': 1, 'ON_FOOT': 94123.80714983154, 'local_dt': LocalDate({'hour': 3, 'month': 12, 'second': 41, 'weekday': 5, 'year': 2015, 'timezone': 'UTC', 'day': 26, 'minute': 25}), 'ts': 1451100341, 'fmt_time': '2015-12-26T03:25:41.761620+00:00'}),
ModeStatTimeSummary({'nUsers': 1, 'ON_FOOT': 10611.549820243947, 'local_dt': LocalDate({'hour': 23, 'month': 4, 'second': 52, 'weekday': 4, 'year': 2016, 'timezone': 'UTC', 'day': 8, 'minute': 21}), 'ts': 1460157712, 'fmt_time': '2016-04-08T23:21:52.997999+00:00'}),
ModeStatTimeSummary({'nUsers': 1, 'ON_FOOT': 23612.511889582733, 'local_dt': LocalDate({'hour': 16, 'month': 4, 'second': 44, 'weekday': 3, 'year': 2016, 'timezone': 'UTC', 'day': 14, 'minute': 9}), 'ts': 1460650184, 'fmt_time': '2016-04-14T16:09:44.752386+00:00'}),
ModeStatTimeSummary({'nUsers': 1, 'ON_FOOT': 57683.09450387482, 'local_dt': LocalDate({'hour': 0, 'month': 7, 'second': 53, 'weekday': 5, 'year': 2016, 'timezone': 'UTC', 'day': 23, 'minute': 11}), 'ts': 1469232713, 'fmt_time': '2016-07-23T00:11:53.653433+00:00'}),

Looking at the points locally, we see suspicious looking sections for:

2015-09-30, 4 hour long trip, 56 km. So this comes out to 13 kmph. And both the beginning and the end are walk sections that have real points. So maybe this is legit?
2016-04-08, big zig zag at the beginning of a train trip.

Why are we getting metrics for the other sections? What's wrong with these sections? May add a simple set of rules for walk and bike to avoid such outliers at this point.

Phone 1	Phone 2	Phone 3

shankari commented 8 years ago

So the metrics are legit - the dates are confusing.

Here are the bad trips for the other days. The 23rd is missing on my computer.

Dec 20	Dec 26	8 Apr

shankari commented 8 years ago

These are too varied to try and fix now. For now, I will just say that walk sections that are > 10 km and with speed > 10 kmph/hr should be marked as UNKNOWN. And bike sections that are > 100 km and with speed > 40 kmph/hr should be marked as UNKNOWN.

From http://www.dailymile.com/forums/beginner-cycling/questions/4962-what-is-considered-a-good-cycling-speed

Beginners 13-15
Decent riders: 15-17
Better: 18-19
pretty good 19-20
Lower cat 3-4 Racers 20-22
upper cat 1-2 racers 22-25
winners 25+

That should work for everybody except race winners. And UNKNOWN isn't the end of the world as long as we allow people to edit it.

shankari commented 8 years ago

So technically, we should be able to nuke only the CLEAN_RESAMPLING stage and rerun.

shankari commented 8 years ago

wrt https://github.com/e-mission/e-mission-server/issues/407#issuecomment-248514593, after grepping the prior logs, we see

2016-09-19 16:16:02,784:DEBUG:139656908773184:Inserting entry Entry({'data': Section({' sensed_mode': 4,
 'start_loc': {"coordinates": [-46.47646789515715, -23.42356338461582], "type": "Point"},
 'end_ts': 1471814458.878665,
 'start_ts': 1471813523.073008,
 'start_fmt_time': u'2016-08-21T18:05:23.073008-03:00',
 'end_loc': {"coordinates": [-46.47651544614448, -23.42357263594097], "type": "Point"},
 'source': 'SmoothedHighConfidenceMotion',
 'end_fmt_time': u'2016-08-21T18:20:58.878665-03:00',
 'duration': 935.8056569099426,
 'trip_id': ObjectId('57e00f3288f66307854b2541'),
 '_id': ObjectId('57e00f4288f66307854b25f6'),
 'user_id': UUID('d9667aad-8afc-3619-9fce-7f70ffe1695c'),
 'key': 'segmentation/raw_section'})}) into timeseries

2016-09-19 16:16:27,447:INFO:139656908773184:^^^^^^^^^^^^^^^^^^^^Smoothing section 57e00f4288f66307854b25f6 for user d9667aad-8afc-3619-9fce-7f70ffe1695c^^^^^^^^^^^^^^^^^^^^

2016-09-19 16:16:27,530:DEBUG:139656908773184:Found 5 results

2016-09-19 16:16:27,534:DEBUG:139656908773184:Found iOS section, filling in gaps with fake data
2016-09-19 16:16:27,535:DEBUG:139656908773184:Found large gap ending at 1, filling it
2016-09-19 16:16:27,536:DEBUG:139656908773184:start = 0, end = 1, generating entries between 1471813523.07 and 1471813857.45
2016-09-19 16:16:27,536:DEBUG:139656908773184:Fill lengths are: dist 5, angle 5, lat 5, lng 5

                  fmt_time            ts   latitude  longitude  \
0  2016-08-21T18:05:23.073008-03:00  1.471814e+09 -23.423563 -46.476468   
1                               NaN  1.471814e+09 -23.424014 -46.474633   
2                               NaN  1.471814e+09 -23.423764 -46.474931   
3                               NaN  1.471814e+09 -23.423649 -46.474971   
4                               NaN  1.471814e+09 -23.423604 -46.475048   

fmt_time            ts   latitude  longitude  \
5                               NaN  1.471814e+09 -23.423635 -46.475114   
6  2016-08-21T18:10:57.452023-03:00  1.471814e+09 -23.423583 -46.476499   
7  2016-08-21T18:15:58.169679-03:00  1.471814e+09 -23.423394 -46.476613   
8  2016-08-21T18:16:07.169679-03:00  1.471814e+09 -23.423394 -46.476613   
9  2016-08-21T18:20:58.878665-03:00  1.471814e+09 -23.423573 -46.476515 

2016-09-19 16:16:27,613:DEBUG:139656908773184:For cluster 0 - 1, distance = 0.0, is_cluster = True
2016-09-19 16:16:27,615:DEBUG:139656908773184:For cluster 1 - 6, distance = 64.6947694402, is_cluster = True
2016-09-19 16:16:27,616:DEBUG:139656908773184:For cluster 6 - 10, distance = 2.04404326234, is_cluster = True

2016-09-19 16:16:27,619:DEBUG:139656908773184:bad_segments = [Segment(0, 1, 0.0), Segment(6, 10, 2.04404326234)]
2016-09-19 16:16:27,620:DEBUG:139656908773184:after setting values, outlier_mask = [0 6 7 8 9]

So it is related to the iOS fix. We add a bunch of fake points, we strip out all the real points, and then we drop the fake points with dropna() so we have no real points left. returning the start point at the enter time seems like a good compromise.

shankari commented 8 years ago

2016-09-21 00:36:33,230:DEBUG:140735159468032:Considering trip 57e216e1f6858f4ba0fb8fa3: 2015-12-19T16:08:30.017000-08:00 -> 2015-12-19T16:08:59.991000-08:00
2016-09-21 00:36:33,231:DEBUG:140735159468032:Considering trip 57e216e1f6858f4ba0fb8fa5: 2015-12-19T17:30:50.123000-08:00 -> 2015-12-19T17:32:19.534000-08:00
2016-09-21 00:36:33,231:DEBUG:140735159468032:Considering trip 57e216e1f6858f4ba0fb8fa9: 2015-12-19T17:32:19.534000-08:00 -> 2015-12-20T18:14:09.001000-08:00

Looks like the section was actually fairly small and we extrapolated.

This is a problem with the squishing. We basically didn't do anything special to the sections because they were initially small, and then we squished them together and changed the distance and time.

small section

2016-09-20 22:25:25,864:DEBUG:140735159468032:Found first section, need to extrapolate start point
2016-09-20 22:25:25,865:DEBUG:140735159468032:get_entry_at_ts query = {'metadata.key': 'background/filtered_location', 'user_id': UUID('c1562232-a371-4272-bb4a-6f0f7b1fee18'), 'data.ts': 1450570139.991}
2016-09-20 22:25:25,866:DEBUG:140735159468032:Raw place is 57e216e1f6858f4ba0fb8fa4 and corresponding location is 56776eb95771abda98a06ce1
2016-09-20 22:25:25,878:DEBUG:140735159468032:Adding distance 198.578722673 to original 13.7514191908 to extend section start from [-119.5721166, 37.7375974] to [-119.574155, 37.738366]
2016-09-20 22:25:25,880:DEBUG:140735159468032:After subtracting time 1740.90078339 from original 89.4110000134 to cover additional distance 198.578722673 at speed 0.114066651338, new_start_ts = 1450573309.22

Look at the difference in the squishing mismatch.

2016-09-20 22:36:46,776:DEBUG:140735159468032:squishing mismatch: resetting trip start_loc [-119.574155, 37.738366] to cleaned_start_place.location [-120.2336796, 38.0547286]

For now, in the absence of time to do a better job, I'm just going to abandon the squishing if the gap is too much. We will end up with a big visual gap, but that is significantly worse than having giant fake trips.

shankari commented 8 years ago

If there is a huge gap, we should have an untracked section. Apparently we did, but we also had a bunch of squished trips. Need to investigate further later.

shankari commented 8 years ago

That fixed the 19th/20th, but not the 26th, because in that case it was actually detected as a trip in the first place.

2016-09-21 01:57:30,624:DEBUG:140735159468032:Considering place 57e216e2f6858f4ba0fb8fe4: 2015-12-25T18:45:43.458000-08:00 -> 2015-12-25T19:26:17.173000-08:00

2016-09-21 01:57:30,624:DEBUG:140735159468032:Considering place 57e216e2f6858f4ba0fb8fe6: 2015-12-26T20:35:31.522000-08:00 -> 2015-12-26T20:42:31.351000-08:00

Already a trip

2016-09-21 01:57:31,888:DEBUG:140735159468032:Considering trip 57e216e2f6858f4ba0fb8fe5: 2015-12-25T19:26:17.173000-08:00 -> 2015-12-26T20:35:31.522000-08:00

2016-09-20 20:36:46,143:DEBUG:140735159468032:------------------------------2015-12-25T19:27:11.072000-08:00------------------------------
2016-09-20 20:36:46,146:DEBUG:140735159468032:last5MinsDistances = [ 62.62879746  20.62203581] with length 2
2016-09-20 20:36:46,148:DEBUG:140735159468032:last10PointsDistances = [ 62.62879746  20.62203581   0.        ] with length 3, shape (3,)
2016-09-20 20:36:46,150:DEBUG:140735159468032:len(last10PointsDistances) = 3, len(last5MinsDistances) = 2
2016-09-20 20:36:46,150:DEBUG:140735159468032:last5MinsTimes.max() = 53.8989999294, time_threshold = 300
2016-09-20 20:36:46,151:DEBUG:140735159468032:prev_point.ts = 1451100407.07, curr_point.ts = 1451100431.07, time gap = 23.9990000725 (vs 300), distance_gap = 20.6220358141 (vs 100), speed_gap = 0.859287293297 (vs 0.333333333333) continuing trip
2016-09-20 20:36:46,151:DEBUG:140735159468032:Too few points to make a decision, continuing
2016-09-20 20:36:46,152:DEBUG:140735159468032:------------------------------2015-12-26T20:27:14.622000-08:00------------------------------
2016-09-20 20:36:46,154:DEBUG:140735159468032:last5MinsDistances = [] with length 0
2016-09-20 20:36:46,156:DEBUG:140735159468032:last10PointsDistances = [ 93131.80518862  93173.47398066  93192.32726122      0.        ] with length 4, shape (4,)
2016-09-20 20:36:46,157:DEBUG:140735159468032:len(last10PointsDistances) = 4, len(last5MinsDistances) = 0
2016-09-20 20:36:46,158:DEBUG:140735159468032:last5MinsTimes.max() = nan, time_threshold = 300
2016-09-20 20:36:46,158:DEBUG:140735159468032:prev_point.ts = 1451100431.07, curr_point.ts = 1451190434.62, time gap = 90003.55 (vs 300), distance_gap = 93192.3272612 (vs 100), speed_gap = 1.03542946096 (vs 0.333333333333) continuing trip
2016-09-20 20:36:46,158:DEBUG:140735159468032:Too few points to make a decision, continuing

shankari commented 8 years ago

Can either fix this by splitting at the TRIP_SEGMENTATION level (if the gap is more than an hour, split), or at the CLEANING level.

For now, going to resolve this at the cleaning level. What if you did have a really long trip with a large gap in the middle - e.g. if you were underground or with bad reception for a long time? Although what really happened here was that we really were untracked for a long time, so maybe we should reflect that reality?

shankari commented 8 years ago

going with reflecting that by splitting for more than 12 hour gap.

shankari commented 8 years ago

Everything seems fixed now.

>>> list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/cleaned_section", "data.distance": {"$gt": 10*1000}, "data.sensed_mode": ecwm.MotionTypes.ON_FOOT.value}))
[]
>>> list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/cleaned_section", "data.distance": {"$gt": 50*1000}, "data.sensed_mode": ecwm.MotionTypes.BICYCLING.value}))
[]
>>> for s in list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/cleaned_section", "data.distance": {"$gt": 3*1000}, "data.sensed_mode": ecwm.MotionTypes.ON_FOOT.value})):
    print s["data"]["distance"], s["data"]["duration"], s["data"]["start_fmt_time"], pd.Series(s["data"]["speeds"]).median()
9499.65904538 9529.65904538 2015-07-15T08:53:17.792799-07:00 1.32702602016
4133.75881166 2136.11605334 2015-08-07T17:47:06.215947-07:00 1.68215469347
3663.24108658 3693.24108658 2015-12-08T12:18:26.893563-08:00 1.3832152966
5110.4193003 5140.4193003 2016-02-02T08:22:41.047555-08:00 2.63330721524
6068.1867019 6098.1867019 2016-02-15T10:46:20.997166-08:00 1.77030962961
3535.00596661 2107.50368905 2016-02-27T15:34:12.235311-08:00 1.2763648927
6872.03554323 4401.55729842 2016-02-06T14:48:22.262702-08:00 0.962844162818
3347.75565082 3377.75565082 2016-04-13T12:06:58.509962-07:00 1.42158653283
3307.74112384 3337.74112384 2016-02-09T10:24:14.288667-08:00 2.08387453137
3272.37948726 1246.07399988 2016-08-12T12:27:08.926000-07:00 2.06690369761
3130.47158556 3160.47158556 2015-10-20T15:10:49.006391-07:00 3.05647895952
4053.63350334 746 2015-09-24T19:48:23-07:00 5.1067850809   (MISCLASSIFIED FROM BIKE)
4430.30044196 1110.0508492 2015-08-06T15:41:29.344151-07:00 1.52645984651
3073.42221492 3103.42221492 2016-05-29T17:43:28.958045-07:00 1.02381084003
3513.83611733 1017.12506318 2015-08-08T09:11:03.109937-07:00 3.6925780055
3100.01001716 600.63499999 2015-07-13T15:30:31.576000-07:00 3.40898536236
>>> for s in list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/cleaned_section", "data.distance": {"$gt": 10*1000}, "data.sensed_mode": ecwm.MotionTypes.BICYCLING.value})):
    print s["data"]["distance"], s["data"]["duration"], s["data"]["start_fmt_time"], pd.Series(s["data"]["speeds"]).median()
<nothing>

shankari commented 8 years ago

Re-deploying. Override direct changes

$ git checkout emission/pipeline/intake_stage.py
$ git diff emission/storage/^C
$ git checkout emission/storage/timeseries/builtin_timeseries.py

$ ./e-mission-py.bash bin/reset_analysis_database.py
{u'ok': 1, u'n': 973561}
{u'ok': 1, u'n': 0}
{u'ok': 1, u'n': 0}
{u'ok': 1, u'n': 1347}

Relaunched collection. Did not monitor it. Was done by around 7pm when I checked. So I can re-run the entire pipeline for all current users in ~ 6 hours? Still need to check for errors.

shankari commented 8 years ago

Three fairly innocuous errors show up fairly consistently.

None transition on iOS

2016-09-21 22:04:46,720:ERROR:140290633410368:Backtrace time
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 71, in moveToLongTerm
    unified_entry = enuf.convert_to_common_format(entry)
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/formatters/formatter.py", line 6, in convert_to_common_format
    return format_fn(entry)
  File "/mnt/e-mission/e-mission-server/emission/net/usercache/formatters/ios/transition.py", line 55, in format
    data.transition = transition_map[entry.data.transition].value
KeyError: None

Distance between points is zero

2016-09-21 21:58:47,974:ERROR:140290633410368:Distance between points AttrDict({u'local_dt_hour': 22, u'fmt_time': u'2016-04-26T22:37:46.447609-07:00', u'local_dt_timezone': u'America/Los_Angeles', u'local_dt_second': 46, u'local_dt_year': 2016, u'ts': 1461735466.447609, u'local_dt_minute': 37, u'local_dt_weekday': 1, u'loc': {u'type': u'Point', u'coordinates': [-122.253448856831, 37.86768408032536]}, 'user_id': UUID('43f9361e-1cb6-4026-99ba-458be357d245'), u'floor': 0, 'metadata_write_ts': 1461735468.235691, u'altitude': 93.71350860595703, u'vaccuracy': 10.0, u'local_dt_month': 4, u'longitude': -122.253448856831, u'filter': u'distance', u'local_dt_day': 26, u'latitude': 37.86768408032536, '_id': ObjectId('5720522ed282b8f30deba376'), u'heading': -1.0, u'sensed_speed': -1.0, u'accuracy': 121.0280224618877}), AttrDict({u'local_dt_hour': 22, u'fmt_time': u'2016-04-26T22:37:46.447609-07:00', u'local_dt_timezone': u'America/Los_Angeles', u'local_dt_second': 46, u'local_dt_year': 2016, u'ts': 1461735466.447609, u'local_dt_minute': 37, u'local_dt_weekday': 1, u'loc': {u'type': u'Point', u'coordinates': [-122.2531523366209, 37.86722646225704]}, 'user_id': UUID('43f9361e-1cb6-4026-99ba-458be357d245'), u'floor': 0, 'metadata_write_ts': 1461735468.175485, u'altitude': 93.75988006591797, u'vaccuracy': 10.0, u'local_dt_month': 4, u'longitude': -122.2531523366209, u'filter': u'distan ce', u'local_dt_day': 26, u'latitude': 37.86722646225704, '_id': ObjectId('5720522ed282 b8f30deba374'), u'heading': -1.0, u'sensed_speed': -1.0, u'accuracy': 121.0280224618877 }) is 57.1556044127, although the time delta = 0

No geocoded information available

2016-09-21 21:09:24,806:ERROR:140290633410368:Unable to pre-fill reverse geocoded information, client has to do it
Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 220, in get_filtered_place
    filtered_place_data.display_name = format_result(reverse_geocoded_json)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1068, in format_result
    name = "%s, %s" % (get_fine(rev_geo_result), get_coarse(rev_geo_result))
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1066, in <lambda>
    get_fine = lambda rgr: get_with_fallbacks(rgr["address"], ["road", "neighbourhood"])
KeyError: 'address'

All these three errors are already handled, so maybe they should be downgraded to warnings instead of errors.

process	None transition	Distance between	no geocoded
0	5	5	134
1	1	8	42
2	0	12	91

These cover most of the messages in the error logs. Except for process 1, which had some real errors left.

$ bzcat intake_0-errors.log.gz | grep ERROR | egrep -v "Distance between points|Unable to pre-fill|Backtrace time" | wc -l
0
$ bzcat intake_1-errors.log.gz | grep ERROR | egrep -v "Distance between points|Unable to pre-fill|Backtrace time" | wc -l
6
$ bzcat intake_2-errors.log.gz | grep ERROR | egrep -v "Distance between points|Unable to pre-fill|Backtrace time" | wc -l
0

shankari commented 8 years ago

Let's take a look at the real errors on process 1. We need to pull the data for these users and re-run locally to experiment.

Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 221, in get_geojson_for_timeline
    trip_geojson = trip_to_geojson(trip, tl)
  File "/mnt/e-mission/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 184, in trip_to_geojson
    section_gj = section_to_geojson(section, tl)
  File "/mnt/e-mission/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 117, in section_to_geojson
    assert(False)
AssertionError

2016-09-21 20:45:27,127:ERROR:140290633410368:Storing views to cache failed for user 60fb5b80-e6c7-4a2b-ba69-ad3147712df3

Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 101, in save_cleaned_segments_for_timeline
 filtered_trip = get_filtered_trip(ts, trip)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 166, in get_filtered_trip
 section_map[section.get_id()] = get_filtered_section(filtered_trip_entry, section) 
 File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 256, in get_filtered_section
 with_speeds_df = get_filtered_points(section, filtered_section_data)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 356, in get_filtered_points
 (filtered_section_data, filtered_loc_df.iloc[0], filtered_loc_df.iloc[-1]))
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1286, in __getitem__
 return self._getitem_axis(key, axis=0)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1573, in _getitem_axis
 self._is_valid_integer(key, axis)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1487, in _is_valid_integer
 raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

2016-09-21 21:26:47,967:ERROR:140290633410368:Cleaning and resampling failed for user 95cf2b3e-f3c0-4da5-b8d0-0e460fad424d

Traceback (most recent call last):
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 101, in save_cleaned_segments_for_timeline
    filtered_trip = get_filtered_trip(ts, trip)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 166, in get_filtered_trip
    section_map[section.get_id()] = get_filtered_section(filtered_trip_entry, section)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 256, in get_filtered_section
    with_speeds_df = get_filtered_points(section, filtered_section_data)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 368, in get_filtered_points
    resampled_loc_df = resample(filtered_loc_df, interval=30)
  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_res
ample.py", line 713, in resample
    tz_new = [_get_timezone(ts, tz_ranges_df) for ts in ts_new]  File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_res
ample.py", line 731, in _get_timezone    return sel_entry.timezone.iloc[0]
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1286, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1573, in _getitem_axis
    self._is_valid_integer(key, axis)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", lin
e 1487, in _is_valid_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
2016-09-21 22:16:58,767:ERROR:140290633410368:Cleaning and resampling failed for user 48123fc6-a631-41fb-a68c-ed3d518bcc0d

shankari commented 8 years ago

Assertion Error: Rounding

2016-09-22 12:48:40,708:INFO:140735159468032:section_location_array[-1].data.loc {"coordinates": [12.5923698, 55.6602518], "type": "Point"} != section.data.end_loc 
{"coordinates": [12.59236978, 55.66025177], "type": "Point"} even after df.ts fix, filling gap

Easy fix!

Failed in Resampling

This was a doozy caused by out of order points.

2016-09-22 14:40:09,193:DEBUG:140735159468032:with_speeds_df =                            fmt_time            ts   latitude   longitude  \
0         2016-06-22T21:43:57-07:00  1.466657e+09  37.867698 -122.259759   
1         2016-06-22T20:09:17-07:00  1.466651e+09  37.868355 -122.252893   
2  2016-06-22T21:31:42.502000-07:00  1.466656e+09  37.868315 -122.254355   
3         2016-06-22T21:31:49-07:00  1.466656e+09  37.868279 -122.253637   
4  2016-06-22T21:32:12.061000-07:00  1.466656e+09  37.868558 -122.254532   

      speed    distance  
0  0.000000    0.000000  
1 -0.106885  607.108403  
2  0.025971  128.441850  
3  9.725902   63.198908  
4  3.666813   84.560384

Ooh! The place has a negative duration. enter is 21:40 but exit is 20:09. WTF!

2016-09-22 14:37:45,
812:DEBUG:140735159468032:Saving entry Entry({'data': Rawplace({'enter_fmt_time': u'2016-06-22T21:40:57-07:00',
 'ending_trip': ObjectId('57e44f29f6858f741c41e1c8'),
 'starting_trip': ObjectId('57e44f29f6858f741c41e1ca'),
 'duration': -5500,
 'exit_fmt_time': u'2016-06-22T20:09:17-07:00',
 'source': 'DwellSegmentationTimeFilter',
 'location': {"coordinates": [-122.2597589, 37.8676984], "type": "Point"},
 'enter_ts': 1466656857,
 'exit_ts': 1466651357}),
 '_id': ObjectId('57e44f29f6858f741c41e1c9'),
 'user_id': UUID('96993ce8-e550-4535-aa28-836804596d92'),
 'key': 'segmentation/raw_place'})}) into timeseries

Aha! Stupid phone!!! Sent us points out of order. We sort by metadata.write_ts, but it looks like the data timestamps were out of order.

2016-09-22 14:33:47,790:DEBUG:140735159468032:------------------------------2016-06-22T21:43:10-07:00------------------------------
2016-09-22 14:33:47,794:DEBUG:140735159468032:------------------------------2016-06-22T20:09:17-07:00------------------------------
2016-09-22 14:33:47,801:DEBUG:140735159468032:------------------------------2016-06-22T21:43:37.678000-07:00------------------------------

Note that the metadata write ts is in order

 'metadata_write_ts': 1466656989.717,
 'metadata_write_ts': 1466657016.201,

While the timestamp is not

 u'ts': 1466656990.0,
 u'ts': 1466651357.0,

and the elapsedRealTimeNanos is also in order.

 u'elapsedRealtimeNanos': 610100100257967,
 u'elapsedRealtimeNanos': 610126618689207,

I wonder if this is caused by a rebooted phone? https://developer.android.com/reference/android/location/Location.html#getTime()

How should we fix this? Should we automatically try to fix the location? Probably safest is to just drop the point. Too hard to figure out how to fix it right now. For the record, the points are

[-122.2598261,        37.8676886]
[-122.25289285182953, 37.86835491657257]

So they don't look very close to each other. It is probably not worth it to try and fix. This is incredibly rare - once in multiple days of data over multiple users. So let's just drop and move on.

Error getting filtered trip

This seems to be intermittent - did not reproduce when I tried it locally. Will retry and run again.

shankari commented 8 years ago

So let's just drop and move on.

But how do I drop? Just ignoring it means that I will need to ignore it all the time, which is a bad, fragile solution. I will just remove if from filtered_location in rare case of deleting the timeseries. It is still in the location, so no real harm is done.

shankari commented 8 years ago

This seems to be intermittent - did not reproduce when I tried it locally. Will retry and run again.

Tried two more times - did not recur.

Error is from here:

    # Can move this up to get_filtered_section if we want to ensure that we
    # don't touch filtered_section_data in here
    logging.debug("Setting section start and end values of %s to first point %s"
                  "and last point %s" %
                  (filtered_section_data, filtered_loc_df.iloc[0], filtered_loc_df.iloc[-1]))
    _set_extrapolated_vals_for_section(filtered_section_data,
                                        filtered_loc_df.iloc[0],
                                        filtered_loc_df.iloc[-1])

I guess if there are no filtered points, and this is not the first or last section, we can run into this. In that case, let's insert one entry for the first unfiltered location. We should assume that remove_outliers should return a loc_df with at least one entry.

shankari commented 8 years ago

This last one took a bit of time because it is not reproducible, so I had to actually craft a test case for it. But I think that it works reasonably well now.

shankari commented 8 years ago

https://github.com/shankari/e-mission-server/commit/ea80978cdef7a4f84e92fccbb51ba5581713571f has introduced a regression - Sunil's trip now looks like this.

In spite of the fact that the points show a clear zigzag.

Trip	Not cleaned

shankari commented 8 years ago

All pending fixed merged. Time for another... deployment! We will have to reset everything anyway because of the trip segmentation and smoothing fixes.

shankari commented 8 years ago

Pulled. No edited files on the server this time.

$ git pull 
remote: Counting objects: 95, done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 95 (delta 45), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (95/95), done.
From https://github.com/amplab/e-mission-server
   52a81d3..4808ebf  master     -> origin/master
Updating 52a81d3..4808ebf
Fast-forward

Delete existing analysis

$ ./e-mission-py.bash bin/reset_analysis_database.py

{u'ok': 1, u'n': 189490}
{u'ok': 1, u'n': 0}
{u'ok': 1, u'n': 0}
{u'ok': 1, u'n': 147}

Relaunch! Should be done by tomorrow morning.

$ ./e-mission-py.bash bin/intake_multiprocess.py 3 2>&1 | tee /mnt/logs/emission/intake.stdinout

shankari commented 8 years ago

Woo! No errors!

ubuntu@ip-10-203-173-119:/mnt/logs/emission$ ls -alh intake*error*
-rw-rw-r-- 1 ubuntu ubuntu 0 Sep 23 09:46 intake_0-errors.log.gz
-rw-rw-r-- 1 ubuntu ubuntu 0 Sep 23 09:46 intake_1-errors.log.gz
-rw-rw-r-- 1 ubuntu ubuntu 0 Sep 23 09:46 intake_2-errors.log.gz
-rw-rw-r-- 1 ubuntu ubuntu 0 Sep 23 09:46 intake_launcher-errors.log.gz