Open shankari opened 7 years ago
First, determine the format of the new motion activity objects, and change the server code to account for them.
Here's an example
In [515]: edb.get_timeseries_error_db().find_one({"metadata.key": "background/motion_activity"})
Out[515]:
{u'_id': ObjectId('594c9334cb17471ac033ee7e'),
u'data': {u'zzbjA': 4, u'zzbjB': 75},
u'metadata': {u'key': u'background/motion_activity',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Los_Angeles',
u'type': u'sensor-data',
u'write_ts': 1498187291.604},
u'user_id': UUID('08b31565-f990-4d15-a4a7-89b3ba6b1340')}
The write_ts
is
In [120]: arrow.get(1498187291)
Out[120]: <Arrow [2017-06-23T03:08:11+00:00]>
Last release was on Jun 9, 2017. So this is definitely the new format.
Let's do some more sorting...
In [517]: list(edb.get_timeseries_error_db().find({"metadata.key": "background/motion_activity"}).sort("metadata.write_ts", 1).limit(3))
Out[517]:
[{u'_id': ObjectId('5923ce3acb17471ac015a27b'),
u'data': {u'zzbjA': 3, u'zzbjB': 77},
u'metadata': {u'key': u'background/motion_activity',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Los_Angeles',
u'type': u'sensor-data',
u'write_ts': 1495343353.589},
u'user_id': UUID('fd7b4c2e-2c8b-3bfa-94f0-d1e3ecbd5fb7')},
{u'_id': ObjectId('5923ce3acb17471ac015a27c'),
u'data': {u'zzbjA': 5, u'zzbjB': 100},
u'metadata': {u'key': u'background/motion_activity',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Los_Angeles',
u'type': u'sensor-data',
u'write_ts': 1495343353.614},
u'user_id': UUID('fd7b4c2e-2c8b-3bfa-94f0-d1e3ecbd5fb7')},
{u'_id': ObjectId('5923ce3acb17471ac015a28d'),
u'data': {u'zzbjA': 3, u'zzbjB': 69},
u'metadata': {u'key': u'background/motion_activity',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Los_Angeles',
u'type': u'sensor-data',
u'write_ts': 1495343913.022},
u'user_id': UUID('fd7b4c2e-2c8b-3bfa-94f0-d1e3ecbd5fb7')}]
And
[{u'_id': ObjectId('595b2bf0cb17471ac03ee644'),
u'data': {u'zzbjA': 4, u'zzbjB': 35},
u'metadata': {u'key': u'background/motion_activity',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Phoenix',
u'type': u'sensor-data',
u'write_ts': 1499145456.502},
u'user_id': UUID('7c8bfcff-2a95-4da3-b0bb-027de2ad06eb')},
{u'_id': ObjectId('595b2bf0cb17471ac03ee641'),
u'data': {u'zzbjA': 3, u'zzbjB': 46},
u'metadata': {u'key': u'background/motion_activity',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Phoenix',
u'type': u'sensor-data',
u'write_ts': 1499145426.221},
u'user_id': UUID('7c8bfcff-2a95-4da3-b0bb-027de2ad06eb')},
{u'_id': ObjectId('595b2bf0cb17471ac03ee63e'),
u'data': {u'zzbjA': 3, u'zzbjB': 92},
u'metadata': {u'key': u'background/motion_activity',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Phoenix',
u'type': u'sensor-data',
u'write_ts': 1499145396.142},
u'user_id': UUID('7c8bfcff-2a95-4da3-b0bb-027de2ad06eb')}]
Seems pretty consistent.
Now, update the code on the server.
$ git checkout webapp/www/templates/home.html
$ git pull
Updating 3a97d04..b810170
Fast-forward
bin/reset_pipeline.py | 110 ++++-----
emission/analysis/classification/inference/mode.py | 3 +-
.../net/usercache/formatters/android/motion_activity.py | 4 +
emission/pipeline/reset.py | 208 +++++++++++++++++
emission/storage/decorations/place_queries.py | 66 ++++++
.../analysisTests/intakeTests/TestPipelineRealData.py | 2 +-
emission/tests/pipelineTests/TestPipelineReset.py | 284 ++++++++++++++++++++++++
webapp/www/templates/home.html | 4 +-
8 files changed, 627 insertions(+), 54 deletions(-)
create mode 100644 emission/pipeline/reset.py
create mode 100644 emission/tests/pipelineTests/TestPipelineReset.py
Next, run the script that copies the error elements to the usercache and then to long-term.
$ ./e-mission-py.bash bin/debug/fix_usercache_processing.py 2>&1 | tee /tmp/fix_usercache.log
Errors all appear to be due to the None
transition
$ grep error /tmp/fix_usercache.log
...
DEBUG:root:Inserting entry {u'data': {u'transition': None, u'currState': u'STATE_START'}, u'_id': ObjectId('57ae4b90f311e474bfdaffdf'), u'user_id': UUID('fbd32183-889c-4ba1-92a1-4c2ef2af6d2b'), u'metadata': {u'plugin': u'none', u'write_ts': 1471012495.27217, u'time_zone': u'America/New_York', u'platform': u'ios', u'key': u'statemachine/transition', u'read_ts': 1499199721.016257, u'type': u'message'}} into error timeseries
$ grep error /tmp/fix_usercache.log | egrep -v "'transition': None" | grep -v insert_error
INFO:root:Found 64898 errors in this round
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
DEBUG:root:module_name = emission.net.usercache.formatters.android.client_error
Now, let's reset all the android pipelines to Jun 9th. Actually, I don't want to reset entries that are before the 9th to the 9th. Fix that with https://github.com/e-mission/e-mission-server/pull/499 Then pull again.
Now, reset to Jun 7th, to make it a bit safer. There were some issues with the launch script, fix those. https://github.com/e-mission/e-mission-server/pull/500
Now, reset actually reset to Jun 7th. Let's do a dry-run before doing something irreversible.
$ ./e-mission-py.bash bin/reset_pipeline.py -p android -d 2017-06-07 -n
Namespace(all=False, date='2017-06-07', dry_run=True, email_list=None, platform='android', user_list=None)
DEBUG:root:day_dt is 2017-06-07T00:00:00+00:00
DEBUG:root:day_ts is 1496793600
Waited for a long time. Looks like distinct
is slow.
Added some more logging, but probably just have to wait....
https://github.com/e-mission/e-mission-server/pull/501
Failed because the person uninstalled before the reset date, so we couldn't find any places or trips straddling it.
{u'_id': ObjectId('58461a0888f6637521fe5a67'),
u'data': {u'enter_fmt_time': u'2016-12-05T12:43:30.403000-08:00',
u'exit_fmt_time': u'2016-12-05T17:38:08.072380-08:00'}},
{u'_id': ObjectId('58461a0888f6637521fe5a66'),
u'data': {u'enter_fmt_time': u'2016-12-05T10:54:36.502000-08:00',
u'exit_fmt_time': u'2016-12-05T12:30:15.657499-08:00'}},
{u'_id': ObjectId('58461a0888f6637521fe5a65'),
u'data': {u'enter_fmt_time': u'2016-12-05T10:44:57.670000-08:00',
u'exit_fmt_time': u'2016-12-05T10:44:57.670000-08:00'}},
{u'_id': ObjectId('58424d6388f66329354efd2c'),
u'data': {u'enter_fmt_time': u'2016-12-02T17:08:36.578000-08:00',
u'exit_fmt_time': u'2016-12-02T17:08:36.578000-08:00'}}]
DEBUG:root:last place doc = None
Traceback (most recent call last):
File "bin/reset_pipeline.py", line 83, in <module>
epr.reset_user_to_ts(user_id, day_ts, args.dry_run)
File "/mnt/e-mission/e-mission-server/emission/pipeline/reset.py", line 42, in reset_user_to_ts
last_cleaned_place = esdp.get_last_place_before(esda.CLEANED_PLACE_KEY, ts, user_id)
File "/mnt/e-mission/e-mission-server/emission/storage/decorations/place_queries.py", line 75, in get_last_place_before
raise ValueError("No trip or place straddling time %s" % reset_ts)
ValueError: No trip or place straddling time 1496793600
Failed in spite of the fix, adding additional logging and redirecting output to a file.
DEBUG:root:last place doc = None
DEBUG:root:last place doc = {u'user_id': UUID('5a6a2711-c574-42f0-9940-ea1fd0cc2f09'),
u'_id': ObjectId('5951256188f66373218bcd06'),
u'data': {u'enter_fmt_time': u'2017-06-22T15:35:03.034000-07:00',
u'display_name': u', ',
u'ending_trip': ObjectId('5951256088f66373218bccf0'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-121.3834927, 40.4237075]},
u'enter_ts': 1498170903.034,
u'raw_places': [ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c'),
ObjectId('5951254a88f66373218bca86'),
ObjectId('5951254a88f66373218bca8c')]},
u'metadata': {u'write_fmt_time': u'2017-06-26T08:16:49.685577-07:00',
u'write_ts': 1498490209.685577,
u'time_zone': u'America/Los_Angeles',
u'platform': u'server',
u'key': u'analysis/cleaned_place'}}
Traceback (most recent call last):
File "bin/reset_pipeline.py", line 83, in <module>
epr.reset_user_to_ts(user_id, day_ts, args.dry_run)
File "/mnt/e-mission/e-mission-server/emission/pipeline/reset.py", line 42, in reset_user_to_ts
last_cleaned_place = esdp.get_last_place_before(esda.CLEANED_PLACE_KEY, ts, user_id)
File "/mnt/e-mission/e-mission-server/emission/storage/decorations/place_queries.py", line 82, in get_last_place_before
raise ValueError("No trip or place straddling time %s" % reset_ts)
ValueError: No trip or place straddling time 1496793600
With the additional logging, ran into issues when user_id = None
Fixed by adding a special check for user_id is None
and doing an early return
Then ran into an issue where we have entries for a user id, but no analysis results, and no entry in the uuid_db.
INFO:root:resetting user 95725e34-2e58-3ebd-9f1a-41f9e0131079 to ts 1496793600
DEBUG:root:Looking for last place before 1496793600
DEBUG:root:all places for this user = []
DEBUG:root:last place doc for user 95725e34-2e58-3ebd-9f1a-41f9e0131079 = None
DEBUG:root:last trip doc for user 95725e34-2e58-3ebd-9f1a-41f9e0131079 = None
DEBUG:root:last place doc = None
Namespace(all=False, date='2017-06-07', dry_run=True, email_list=None, platform='android', user_list=None)
Traceback (most recent call last):
File "bin/reset_pipeline.py", line 83, in <module>
epr.reset_user_to_ts(user_id, day_ts, args.dry_run)
File "/mnt/e-mission/e-mission-server/emission/pipeline/reset.py", line 46, in reset_user_to_ts
last_cleaned_place = esdp.get_last_place_before(esda.CLEANED_PLACE_KEY, ts, user_id)
File "/mnt/e-mission/e-mission-server/emission/storage/decorations/place_queries.py", line 79, in get_last_place_before
(last_place.data.enter_ts, reset_ts))
AttributeError: 'NoneType' object has no attribute 'data'
In [529]: edb.get_uuid_db().find_one({"uuid": UUID('95725e34-2e58-3ebd-9f1a-41f9e0131079')})
In [530]: edb.get_timeseries_db().find({"user_id": UUID("95725e34-2e58-3ebd-9f1a-41f9e0131079")}).count()
Out[530]: 4319
In [531]: edb.get_analysis_timeseries_db().find({"user_id": UUID("95725e34-2e58-3ebd-9f1a-41f9e0131079")}).count()
Out[531]: 0
In [532]: edb.get_timeseries_db().find({"user_id": UUID("95725e34-2e58-3ebd-9f1a-41f9e0131079")}).distinct("metadata.key")
Out[532]:
[u'background/battery',
u'statemachine/transition',
u'background/location',
u'background/motion_activity',
u'background/filtered_location']
Ran into the straddling issue again
Namespace(all=False, date='2017-06-07', dry_run=True, email_list=None, platform='android', user_list=None)
Traceback (most recent call last):
File "bin/reset_pipeline.py", line 83, in <module>
epr.reset_user_to_ts(user_id, day_ts, args.dry_run)
File "/mnt/e-mission/e-mission-server/emission/pipeline/reset.py", line 46, in reset_user_to_ts
last_cleaned_place = esdp.get_last_place_before(esda.CLEANED_PLACE_KEY, ts, user_id)
File "/mnt/e-mission/e-mission-server/emission/storage/decorations/place_queries.py", line 86, in get_last_place_before
(reset_ts, user_id))
ValueError: No trip or place straddling time 1496793600 for user 5a6a2711-c574-42f0-9940-ea1fd0cc2f09
And it has a valid UUID entry
In [547]: edb.get_uuid_db().find_one({"uuid": UUID("5a6a2711-c574-42f0-9940-ea1fd0cc2f09")})
Out[547]:
{u'_id': ObjectId('5939a0a5cb17471ac0273514'),
u'update_ts': datetime.datetime(2017, 6, 8, 19, 8, 21, 584000),
u'user_email': <redacted>,
u'uuid': UUID('5a6a2711-c574-42f0-9940-ea1fd0cc2f09')}
Analysis entries
In [540]: edb.get_analysis_timeseries_db().find({"user_id": UUID("5a6a2711-c574-42f0-9940-ea1fd0cc2f09")}).count()
Out[540]: 2083
Including a bunch of cleaned places
In [541]: edb.get_analysis_timeseries_db().find({"user_id": UUID("5a6a2711-c574-42f0-9940-ea1fd0cc2f09")}).distinct("metadata.key")
Out[541]:
[u'segmentation/raw_trip',
u'segmentation/raw_place',
u'segmentation/raw_section',
u'segmentation/raw_stop',
u'analysis/smoothing',
u'analysis/recreated_location',
u'analysis/cleaned_section',
u'analysis/cleaned_stop',
u'analysis/cleaned_place',
u'analysis/cleaned_trip',
u'segmentation/raw_untracked',
u'analysis/cleaned_untracked']
Hm. The first one is around the time of the reset. Let's convert our reset time to the correct timezone and see what we find.
In [545]: list(edb.get_analysis_timeseries_db().find({"user_id": UUID("5a6a2711-c574-42f0-9940-ea1fd0cc2f09"), "metadata.key": "analysis/cleaned_place"}, {"data.enter_fmt_time": 1, "data.exit_fmt_time": 1}).sort("data.enter_ts", 1))
Out[545]:
[{u'_id': ObjectId('593a322788f66336cce330cf'),
u'data': {u'exit_fmt_time': u'2017-06-08T18:07:32.986396-07:00'}},
{u'_id': ObjectId('593a322788f66336cce330d0'),
u'data': {u'enter_fmt_time': u'2017-06-08T18:27:27-07:00',
u'exit_fmt_time': u'2017-06-08T19:51:47.571543-07:00'}},
{u'_id': ObjectId('593a322888f66336cce330d2'),
u'data': {u'enter_fmt_time': u'2017-06-08T20:03:47-07:00',
u'exit_fmt_time': u'2017-06-08T20:05:52-07:00'}},
{u'_id': ObjectId('593c4e7b88f66321358e9477'),
u'data': {u'enter_fmt_time': u'2017-06-08T20:10:42-07:00',
u'exit_fmt_time': u'2017-06-08T20:10:42-07:00'}},
...
The first full place entry is 2017-06-08T18:27:27-07:00 - 2017-06-08T19:51:47.571543-07:00. We are resetting to 2017-06-07 UTC = 2017-06-06T17:00:00-07:00 Since all places are after the reset_ts, none of them will straddle it, and the last place will also be after the reset_ts.
DEBUG:root:last place doc for user 5a6a2711-c574-42f0-9940-ea1fd0cc2f09 = None
DEBUG:root:last trip doc for user 5a6a2711-c574-42f0-9940-ea1fd0cc2f09 = None
DEBUG:root:last_place = Entry({u'data':
{u'enter_fmt_time': u'2017-06-22T15:35:03.034000-07:00',
u'display_name': u',
',
u'enter_ts': 1498170903.034,
u'_id': ObjectId('5951256188f66373218bcd06'),
u'user_id': UUID('5a6a2711-c574-42f0-9940-ea1fd0cc2f09'),
u'metadata': {u'write_fmt_time': u'2017-06-26T08:16:49.685577-07:00',
u'key': u'analysis/cleaned_place'}}),
reset_ts = 1496793600
In this case, we want to fall back to resetting to the start.
Not sure that this is actually resolved!
Made a minor change for performance.
Looking at all the entries and getting the distinct user_ids from them takes a long time.
For all recent entries, we should have curr_platform
set.
Since the reset_ts is fairly recent, I can make this change.
def _find_platform_users(platform):
- return edb.get_timeseries_db().find({'metadata.platform': platform}).distinct(
- 'user_id')
+# return edb.get_timeseries_db().find({'metadata.platform': platform}).distinct(
+# 'user_id')
+ return edb.get_profile_db().find({"curr_platform": platform}).distinct("user_id")
With this, the initial determination of the uuids is fast, and the dry-run completed without any failures. Woo!
Looking through the dry-run logs, I note several uuids with no place entries. Who are all these people, why are they signing up for e-mission, but not actually using it in any way?
INFO:root:resetting user 662d93fa-e150-43ff-b11d-e1d2facaec1b to ts 1496793600
INFO:root:resetting user d98ee40f-9c39-4388-ae66-757251d14eba to ts 1496793600
INFO:root:resetting user 3f1960fd-bfaa-4032-9562-19f5ca244f3f to ts 1496793600
INFO:root:resetting user d6717ba4-352d-4901-a503-f73c51811091 to ts 1496793600
INFO:root:resetting user cd710286-ff58-4a7d-93bd-31e305d14c73 to ts 1496793600
INFO:root:resetting user 71991ef1-fc9d-4fbe-83fb-1ed0434f324a to ts 1496793600
INFO:root:resetting user 8bd673ab-318b-4cfa-add8-fc71390e3500 to ts 1496793600
INFO:root:resetting user b566100c-b519-4267-be98-8f4bb045261c to ts 1496793600
INFO:root:resetting user f928898f-0450-40c1-b98c-ed6901135b55 to ts 1496793600
INFO:root:resetting user 176f0a89-d927-42ae-9a14-8079818eb45d to ts 1496793600
INFO:root:resetting user 4d663395-ddc6-4ca7-8d69-aa039d7d886c to ts 1496793600
INFO:root:resetting user 9e75449b-d105-40d5-9b12-9cff475252f1 to ts 1496793600
INFO:root:resetting user 6d265af9-6185-43f0-80d1-0d5ce2173cda to ts 1496793600
INFO:root:resetting user 4a66843c-985e-4a9e-a690-87830bcb007c to ts 1496793600
There doesn't appear to be a big rush of installs, but they are trickling in, a few a month.
[datetime.datetime(2017, 3, 21, 10, 51, 9, 971000),
datetime.datetime(2017, 4, 6, 5, 34, 28, 165000),
datetime.datetime(2017, 4, 12, 22, 7, 24, 955000),
datetime.datetime(2017, 4, 14, 20, 54, 39, 612000),
datetime.datetime(2017, 4, 18, 5, 20, 20, 262000),
datetime.datetime(2017, 4, 18, 5, 30, 30, 44000),
datetime.datetime(2017, 4, 24, 6, 30, 27, 691000),
datetime.datetime(2017, 5, 1, 11, 3, 6, 781000),
datetime.datetime(2017, 5, 1, 22, 34, 25, 997000),
datetime.datetime(2017, 5, 13, 2, 19, 28, 333000),
datetime.datetime(2017, 5, 20, 13, 2, 19, 154000),
datetime.datetime(2017, 5, 22, 23, 48, 14, 516000),
datetime.datetime(2017, 6, 6, 2, 35, 40, 555000),
datetime.datetime(2017, 6, 18, 5, 7, 11, 212000)]
I should probably follow up with these people and see if they are legit and the app didn't work for them or whether they are not legit.
Ok, running without dry-run
$ ./e-mission-py.bash bin/reset_pipeline.py -p android -d 2017-06-07 2>&1 | tee /tmp/reset_pipeline_real.log
....
INFO:root:About to delete 7 pipeline states
INFO:root:this is not a dry run, result of removing analysis objects = {u'ok': 1, u'n': 4048}
INFO:root:this is not a dry run, result of removing pipeline states = {u'ok': 1, u'n': 7}
Namespace(all=False, date='2017-06-07', dry_run=False, email_list=None, platform='android', user_list=None)
Finished with no errors.
Next, I need to reset the pipeline for myself to December 10th to apply the fix to https://github.com/e-mission/e-mission-server/issues/457
First run with dry_run
$ ./e-mission-py.bash bin/reset_pipeline.py -u <redacted> -d 2016-12-10 -n
DEBUG:root:trip_seg_update_pipeline_query = {'$set': {'last_processed_ts': 1481313996.457}}
INFO:root:resetting 1 trip_seg_pipeline states for user <redacted> to 1481313996.46
DEBUG:root:reset_pipeline_query = {'last_processed_ts': {'$gt': 1481313696.457}, 'pipeline_stage': {'$ne': 1}, 'user_id': UUID('<redacted>')}
DEBUG:root:update_pipeline_query = {'$set': {'last_processed_ts': 1481313696.457}}
INFO:root:out of 7 total, resetting 4 pipeline states for user <redacted>
INFO:root:this is a dry run, returning from reset_pipeline_state without modifying anything
Next run without dry_run
$ ./e-mission-py.bash bin/reset_pipeline.py -u <redacted> -d 2016-12-10
INFO:root:out of 7 total, resetting 4 pipeline states for user ea59084e-11d4-4076-9252-3b9a29ce35e0 to 1481313696.46
DEBUG:root:this is not a dry run, result of updating trip_segmentation stage in reset_pipeline_state = {'updatedExisting': True, u'nModified': 1, u'ok': 1, u'n': 1}
DEBUG:root:this is not a dry run, result of updating all other stages in reset_pipeline_state = {'updatedExisting': True, u'nModified': 4, u'ok': 1, u'n': 4}
Now, restart the pipeline manually
$ PYTHONPATH=. /home/ubuntu/anaconda/bin/python bin/intake_multiprocess.py 3 >> /mnt/logs/emission/intake.stdinout 2>&1
No errors so far, will check log and phones tomorrow morning. Maybe this is finally done!
This is not too serious because it is only in the stage to generate outputs, but I get the following error for several
Traceback (most recent call last):
File "/mnt/e-mission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 101, in storeViewsToCache
self.storeTimelineToCache(time_query)
File "/mnt/e-mission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 168, in storeTimelineToCache
(start_ts, pydt.datetime.utcfromtimestamp(start_ts).isoformat()))
TypeError: a float is required
The relevant code is
start_ts = esp.get_complete_ts(self.user_id)
logging.debug("start ts from pipeline = %s, %s" %
(start_ts, pydt.datetime.utcfromtimestamp(start_ts).isoformat()))
trip_gj_list = self.get_trip_list_for_seven_days(start_ts)
from which we can infer that the start_ts
is None
.
We get the complete_ts
from
def get_complete_ts(user_id):
return get_current_state(user_id, ps.PipelineStages.CLEAN_RESAMPLING).last_processed_ts
So this implies that the CLEAN_RESAMPLING
state has a last_processed_ts = None
.
Let us see if this is true. Yup, it is!
In [577]: list(edb.get_pipeline_state_db().find({"user_id": UUID("7c6d50fd-efbc-4438-9f36-f73411ac3a3c")}))
Out[577]:
[{u'_id': ObjectId('589a848d88f66351eb985cd7'),
u'curr_run_ts': None,
u'last_processed_ts': None,
u'last_ts_run': 1499268837.42303,
u'pipeline_stage': 6, (ACCURACY_FILTERING)
u'user_id': UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')},
{u'_id': ObjectId('589a848d88f66351eb985cd9'),
u'curr_run_ts': None,
u'last_processed_ts': None,
u'last_ts_run': 1499268837.450148,
u'pipeline_stage': 1, (TRIP_SEGMENTATION)
u'user_id': UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')},
{u'_id': ObjectId('589a848d88f66351eb985cdb'),
u'curr_run_ts': None,
u'last_processed_ts': None,
u'last_ts_run': 1499268837.648338,
u'pipeline_stage': 2, (SECTION_SEGMENTATION)
u'user_id': UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')},
{u'_id': ObjectId('589a848e88f66351eb985cdd'),
u'curr_run_ts': None,
u'last_processed_ts': None,
u'last_ts_run': 1499268837.697028,
u'pipeline_stage': 3, (JUMP_SMOOTHING)
u'user_id': UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')},
{u'_id': ObjectId('589a848e88f66351eb985cdf'),
u'curr_run_ts': None,
u'last_processed_ts': None,
u'last_ts_run': 1499268837.727734,
u'pipeline_stage': 11, (CLEAN_RESAMPLING)
u'user_id': UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')},
{u'_id': ObjectId('589a848e88f66351eb985ce2'),
u'curr_run_ts': None,
u'last_processed_ts': None,
u'last_ts_run': 1489931695.867638,
u'pipeline_stage': 9, (OUTPUT_GEN)
u'user_id': UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')}]
So basically, none of the stages have run successfully.
Let's see what data we have. Answer: None. All entries are stats
, so we skip this user.
In [578]: edb.get_timeseries_db().find({"user_id": UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')}).count()
Out[578]: 1524
In [579]: edb.get_timeseries_db().find({"user_id": UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')}).distinct("metadata.key")
Out[579]: [u'stats/server_api_time', u'stats/pipeline_time']
And there's nothing in the usercache either. Basically, we have no data for this user.
In [580]: edb.get_usercache_db().find({"user_id": UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')}).count()
Out[580]: 1
In [581]: edb.get_usercache_db().find({"user_id": UUID('7c6d50fd-efbc-4438-9f36-f73411ac3a3c')}).distinct("metadata.key")
Out[581]: [u'common-trips']
In this case, there's no need to generate output either. Let's just check the start_ts
and skip instead of generating an error.
Got another error. Before this, we have encountered this case when a user logged into two phones with the same ID.
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] != [0.0, 0.195279647629855, 5.091991301009006, 6.376307295933423, 1.8666625417892948, 9.356544761901484, 0.8267522773714795, 0.8267531290795987, 4.74761677183518, 12.414261490574326]
Cleaning and resampling failed for user 68d16b1a-1521-4eaf-922a-20ba037cdc79
Traceback (most recent call last):
File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 72, in clean_and_resample
last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 98, in save_cleaned_segments_for_ts
return save_cleaned_segments_for_timeline(user_id, tl)
File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 124, in save_cleaned_segments_for_timeline
(last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 886, in create_and_link_timeline
link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 940, in link_trip_start
ts, cleaned_trip_data, cleaned_start_place_data)
File "/mnt/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1063, in _fix_squished_place_mismatch
assert False
AssertionError
No further issues. Once I investigate the AssertionError
, this issue can be closed
Also, note that this didn't really give new habitica points. In particular, the autocheck task would be marked as completed upto whatever date. We would need to reset that as well. Should I add that to the pipeline reset code? Or have a separate script?
Since the habitica points are not idempotent, we don't want to always reset the pipeline - only when we haven't been giving points all along. So let's make it a separate script for now.
Also, I checked the test phone, and Manonmani's and Tom's phones, and none of them seem to have data from the past week properly recorded. But there were no errors!
Debugging Tom's phone now...
Here's the run.
Couple of things to note:
2017-06-16T20:18:24.227000
.
Why isn't it June 7th, like we set?And there is no data after 16th, so there is no processing. Why is there no data after the 16th? Was tracking turned off? Need phone logs.
2017-07-04 05:04:27,458:INFO:140503730042688:**********UUID 96af3842-d5fb-4f13-aea0-726efaeba6ea: moving to long term**********
2017-07-04 05:04:27,482:DEBUG:140503730042688:result = {'updatedExisting': True, u'nModified': 1, u'ok': 1, u'n': 1} after updating read timestamp
2017-07-04 05:04:27,699:DEBUG:140503730042688:Found 234 messages in response to query {'$or': [{'metadata.type': 'message'}, {'metadata.type': 'sensor-data'}, {'metadata.type': 'rw-document'}], 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}
2017-07-04 05:04:27,702:INFO:140503730042688:For stage PipelineStages.USERCACHE, start_ts = 2017-06-16T20:18:24.227000
2017-07-04 05:04:27,705:DEBUG:140503730042688:result = {'updatedExisting': False, u'nModified': 0, u'ok': 1, u'n': 0} after updating read timestamp
2017-07-04 05:04:27,732:DEBUG:140503730042688:Found 0 messages in response to query {'$or': [{'metadata.type': 'message'}, {'metadata.type': 'sensor-data'}, {'metadata.type': 'rw-document'}], 'metadata.write_ts': {'$lte': 1499144662.702402, '$gte': 1497644304.227}, 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}
2017-07-04 05:04:27,733:INFO:140503730042688:In moveToLongTerm, no entries to save
2017-07-04 05:04:27,733:DEBUG:140503730042688:Deleting all entries for query <emission.storage.timeseries.timequery.TimeQuery object at 0x7fc973fe2950>
2017-07-04 05:04:27,733:DEBUG:140503730042688:About to delete messages matching query {'$or': [{'metadata.type': 'message'}, {'metadata.type': 'sensor-data'}, {'metadata.type': 'rw-document'}], 'metadata.write_ts': {'$lte': 1499144662.702402, '$gte': 1497644304.227}, 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}
2017-07-04 05:04:27,735:DEBUG:140503730042688:Delete result = {u'ok': 1, u'n': 0}
2017-07-04 05:04:27,737:INFO:140503730042688:For stage PipelineStages.USERCACHE, last_ts_processed is unchanged
2017-07-04 05:04:27,741:DEBUG:140503730042688:insert called
2017-07-04 05:04:27,741:DEBUG:140503730042688:entry was fine, no need to fix it
2017-07-04 05:04:27,742:DEBUG:140503730042688:Inserting entry Entry({'data': {'reading': 0.2814149856567383, 'name': 'USERCACHE', 'ts': 1499144667.73971}, '_id': ObjectId('595b21db88f66372747f37aa'), 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), 'metadata': Metadata({'write_fmt_time': '2017-07-03T22:04:27.739878-07:00', 'write_ts': 1499144667.739878, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 22, 'month': 7, 'second': 27, 'weekday': 0, 'year': 2017, 'timezone': 'America/Los_Angeles', 'day': 3, 'minute': 4}), 'key': 'stats/pipeline_time'})}) into timeseries
2017-07-04 05:05:10,583:INFO:140503730042688:**********UUID 96af3842-d5fb-4f13-aea0-726efaeba6ea: filter accuracy if needed**********
2017-07-04 05:05:10,586:INFO:140503730042688:For stage PipelineStages.ACCURACY_FILTERING, start_ts is None
2017-07-04 05:05:10,589:DEBUG:140503730042688:Not a public phone, must already have filtered data, early return
2017-07-04 05:05:10,591:INFO:140503730042688:For stage PipelineStages.ACCURACY_FILTERING, last_ts_processed is unchanged
2017-07-04 05:05:10,594:DEBUG:140503730042688:insert called
2017-07-04 05:05:10,594:DEBUG:140503730042688:entry was fine, no need to fix it
2017-07-04 05:05:10,595:DEBUG:140503730042688:Inserting entry Entry({'data': {'reading': 0.009174108505249023, 'name': 'ACCURACY_FILTERING', 'ts': 1499144710.592516}, '_id': ObjectId('595b220688f66372747f37ab'), 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), 'metadata': Metadata({'write_fmt_time': '2017-07-03T22:05:10.592669-07:00', 'write_ts': 1499144710.592669, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 22, 'month': 7, 'second': 10, 'weekday': 0, 'year': 2017, 'timezone': 'America/Los_Angeles', 'day': 3, 'minute': 5}), 'key': 'stats/pipeline_time'})}) into timeseries
2017-07-04 05:05:10,597:INFO:140503730042688:**********UUID 96af3842-d5fb-4f13-aea0-726efaeba6ea: segmenting into trips**********
2017-07-04 05:05:10,599:INFO:140503730042688:For stage PipelineStages.TRIP_SEGMENTATION, start_ts = 2017-06-16T20:18:23.917000
2017-07-04 05:05:10,601:DEBUG:140503730042688:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'metadata.write_ts': {'$lte': 1499144705.599898, '$gte': 1497644303.917}, 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}, sort_key = metadata.write_ts
2017-07-04 05:05:10,601:DEBUG:140503730042688:orig_ts_db_keys = ['background/filtered_location'], analysis_ts_db_keys = []
2017-07-04 05:05:10,601:DEBUG:140503730042688:finished querying values for ['background/filtered_location']
2017-07-04 05:05:10,602:DEBUG:140503730042688:finished querying values for []
2017-07-04 05:05:14,378:DEBUG:140503730042688:Found 0 results
2017-07-04 05:05:14,378:DEBUG:140503730042688:len(loc_df) == 0, early return
2017-07-04 05:05:14,381:INFO:140503730042688:For stage PipelineStages.TRIP_SEGMENTATION, last_ts_processed is unchanged
2017-07-04 05:05:14,384:DEBUG:140503730042688:insert called
2017-07-04 05:05:14,384:DEBUG:140503730042688:entry was fine, no need to fix it
2017-07-04 05:05:14,385:DEBUG:140503730042688:Inserting entry Entry({'data': {'reading': 3.785299062728882, 'name': 'TRIP_SEGMENTATION', 'ts': 1499144714.3827}, '_id': ObjectId('595b220a88f66372747f37ac'), 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), 'metadata': Metadata({'write_fmt_time': '2017-07-03T22:05:14.382881-07:00', 'write_ts': 1499144714.382881, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 22, 'month': 7, 'second': 14, 'weekday': 0, 'year': 2017, 'timezone': 'America/Los_Angeles', 'day': 3, 'minute': 5}), 'key': 'stats/pipeline_time'})}) into timeseries
There are other files that have this UUID - let's see if they have the answers.
Did moveToLongTerm
from the fix_usercache_processing.py
break something?
There are other files that have this UUID - let's see if they have the answers.
Other files are from older runs. The run tracked by this reset is the one above.
Let's look at the reset code. It should have been to 6th June.
INFO:root:resetting user 96af3842-d5fb-4f13-aea0-726efaeba6ea to ts 1496793600
DEBUG:root:Looking for last place before 1496793600
DEBUG:root:last place doc for user 96af3842-d5fb-4f13-aea0-726efaeba6ea = {u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'),
u'_id': ObjectId('5937b1be88f6633693cb7464'),
u'data': {u'enter_fmt_time': u'2017-06-06T12:57:56.402000-07:00',
u'display_name': u'South Shoreline Boulevard, Mountain View',
u'exit_fmt_time': u'2017-06-06T18:27:21.074421-07:00',
u'ending_trip': ObjectId('5937b19388f6633693cb7394'),
u'starting_trip': ObjectId('5937b19488f6633693cb73a4'),
u'enter_ts': 1496779076.402,
u'duration': 19764.672421216965,
u'raw_places': [ObjectId('5937b18d88f6633693cb733c'),
ObjectId('5937b18d88f6633693cb733c')],
u'exit_ts': 1496798841.0744212},
u'metadata': {u'write_fmt_time': u'2017-06-07T00:56:46.733161-07:00',
u'key': u'analysis/cleaned_place'}}
DEBUG:root:last trip doc for user 96af3842-d5fb-4f13-aea0-726efaeba6ea = None
DEBUG:root:first_raw_place = Entry({u'data': {u'enter_fmt_time': u'2017-06-06T12:57:56.402000-07:00',
u'exit_fmt_time': u'2017-06-06T18:38:36.388000-07:00',
u'ending_trip': ObjectId('5937b18d88f6633693cb733b'),
u'starting_trip': ObjectId('5937b18d88f6633693cb733d'),
u'enter_ts': 1496779076.402,
u'duration': 20439.986000061035,
u'exit_ts': 1496799516.388},
u'_id': ObjectId('5937b18d88f6633693cb733c'),
u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'),
u'metadata': {u'write_fmt_time': u'2017-06-07T00:55:57.613423-07:00',
u'key': u'segmentation/raw_place'}})
DEBUG:root:last_place_enter_ts = 1496779076.4 (2017-06-06T19:57:56.400000+00:00)
DEBUG:root:reset_ts = 1496779076.4 (2017-06-06T19:57:56.400000+00:00)
OK so this is the real reset. The other one seems to be the run on the 4th. Started in June as expected.
2017-07-05 16:33:22,372:INFO:140522484148032:**********UUID 96af3842-d5fb-4f13-aea0-726efaeba6ea: moving to long term**********
2017-07-05 16:33:22,399:DEBUG:140522484148032:result = {'updatedExisting': True, u'nModified': 1, u'ok': 1, u'n': 1} after updating read timestamp
2017-07-05 16:33:22,943:DEBUG:140522484148032:Found 2108 messages in response to query {'$or': [{'metadata.type': 'message'}, {'metadata.type': 'sensor-data'}, {'metadata.type': 'rw-document'}], 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}
2017-07-05 16:33:22,945:INFO:140522484148032:For stage PipelineStages.USERCACHE, start_ts = 2017-06-06T19:57:56.402000
Copied a bunch of motion_activity
entries (as expected).
2017-07-05 16:33:22,943:DEBUG:140522484148032:Found 2108 messages in response to query {'$or': [{'metadata.type': 'message'}, {'metadata.type': 'sensor-data'}, {'metadata.type': 'rw-document'}], 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}
2017-07-05 16:33:23,216:DEBUG:140522484148032:Found 1881 messages in response to query {'$or': [{'metadata.type': 'message'}, {'metadata.type': 'sensor-data'}, {'metadata.type': 'rw-document'}], 'metadata.write_ts': {'$lte': 1499272397.945277, '$gte': 1496779076.402}, 'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}
2017-07-05 16:33:23,216:DEBUG:140522484148032:module_name = emission.net.usercache.formatters.android.motion_activity
2017-07-05 16:33:23,218:DEBUG:140522484148032:write_ts = 1497115303.38
2017-07-05 16:33:23,219:DEBUG:140522484148032:module_name = emission.net.usercache.formatters.android.motion_activity
2017-07-05 16:33:23,220:DEBUG:140522484148032:write_ts = 1497115303.46
2017-07-05 16:33:23,221:DEBUG:140522484148032:module_name = emission.net.usercache.formatters.android.motion_activity
2017-07-05 16:33:23,222:DEBUG:140522484148032:write_ts = 1497115333.28
Ran through the rest of the pipeline, and ended up with trips upto the 16th again. It looks like we didn't get any data uploaded after the 16th? I'm going to look at the test phone as well, and then start looking at phone logs.
2017-07-05 16:34:04,743:INFO:140522484148032:**********UUID 96af3842-d5fb-4f13-aea0-726efaeba6ea: segmenting into trips**********
2017-07-05 16:34:04,745:INFO:140522484148032:For stage PipelineStages.TRIP_SEGMENTATION, start_ts = 2017-06-06T20:02:56.402000
...
2017-07-05 16:43:26,963:DEBUG:140522484148032:Considering trip 595d16d788f6634e616b6edc: 2017-06-16T07:56:31-07:00 -> 2017-06-16T07:59:27-07:00
2017-07-05 16:43:26,964:DEBUG:140522484148032:Considering trip 595d16d788f6634e616b6ee5: 2017-06-16T08:13:35.705292-07:00 -> 2017-06-16T10:35:00-07:00
2017-07-05 16:43:26,964:DEBUG:140522484148032:Considering trip 595d16db88f6634e616b700b: 2017-06-16T10:44:10.215876-07:00 -> 2017-06-16T13:14:03-07:00
For Tom and Manonmani, this is because tracking was turned off, although the tracking slider showed that it was on. I have documented the bug with the tracking slider in https://github.com/e-mission/e-mission-phone/issues/238
The only one left now is the second test phone. I'm going to fix the habitica timestamp before doing any further investigation.
wrt https://github.com/e-mission/e-mission-server/issues/497#issuecomment-313427524, another reason for making a separate script is that not all tasks will use timestamps as part of their state.
Habitica timestamps fixed in #507
Next, run the new habitica timestamp reset pipeline
./e-mission-py.bash bin/ext_service/reset_habitica_timestamps.py -p android -- 2017-06-07
and separately for me
./e-mission-py.bash bin/ext_service/reset_habitica_timestamps.py -u ea59084e-11d4-4076-9252-3b9a29ce35e0 -- 2016-12-10
Doesn't appear to have any errors!
It's just useful to document the steps for the record