Figure out whether or not the habitica game calculation should be a pipeline stage

shankari commented 8 years ago

The habitica game calculation currently implements a homegrown pipeline stage in which the last timestamp is stored as part of the habitica users' entry. This means that it won't get reset as part of the reset_pipeline stage and won't be recomputed the next time that the pipeline is run. On the other hand, maybe this is A Good Thing because we don't roll back points granted when we reset the pipeline. And it is not clear that we should, because the user has already observed the results.

Need to discuss this with Juliana and decide for the long-term.

shankari commented 8 years ago

Yes, it should be. Or at least, the calculation should be much closer to the current pipeline stages than it currently is. I noticed that I didn't get any points for the trip to San Francisco yesterday, so I looked into this some more. And what I found was that the current implementation has the same problem as the one we fixed in ec004153d40a61e10097e01debabb76bf1cdf444. Re-using the standard pattern will help us avoid re-discovering issues like this.

Concretely, during this run, we checked for metrics for all trips in the past hour, and we didn't find any.

2016-08-15 16:10:17,361:INFO:**********UUID 0763de67-f61e-3f5d-90e7-518e69793954: checking active mode trips to autocheck habits**********
2016-08-15 16:10:17,361:DEBUG:Entering habitica autocheck for user 0763de67-f61e-3f5d-90
e7-518e69793954
2016-08-15 16:10:17,363:DEBUG:Habitica user: [{.., u'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'),..., u'habitica_username': u'Shankari', u'metrics_data': {u'walk_count':  315.67272985456407, u'bike_count': 164.02210004814106, u'last_timestamp': 1471273743}, u'_id': ObjectId('5791188188f663356338fed2'),...}]
2016-08-15 16:10:17,363:DEBUG:For user 0763de67-f61e-3f5d-90e7-518e69793954, about to proxy GET method /api/v3/tasks/user?type=habits with args None
2016-08-15 16:10:17,364:DEBUG:auth_headers = {'x-api-user': u'e5d31351-a18c-4898-9b56-21
c3dd58c834', 'x-api-key': u'0793307f-dc24-40d2-8abe-a91fc5b685d0'}
2016-08-15 16:10:17,370:INFO:Starting new HTTP connection (1): 54.159.38.241
2016-08-15 16:10:17,412:DEBUG:"GET /api/v3/tasks/user?type=habits HTTP/1.1" 200 None
2016-08-15 16:10:17,414:DEBUG:result = <Response [200]>
2016-08-15 16:10:17,417:DEBUG:For user 0763de67-f61e-3f5d-90e7-518e69793954, about to proxy GET method /api/v3/tasks/user?type=habits with args None
2016-08-15 16:10:17,418:DEBUG:auth_headers = {'x-api-user': u'e5d31351a18c-4898-9b56-21c3dd58c834', 'x-api-key': u'0793307f-dc24-40d2-8abe-a91fc5b685d0'}
2016-08-15 16:10:17,419:INFO:Starting new HTTP connection (1): 54.159.38.241
2016-08-15 16:10:17,459:DEBUG:"GET /api/v3/tasks/user?type=habits HTTP/1.1" 200 None
2016-08-15 16:10:17,460:DEBUG:result = <Response [200]>
2016-08-15 16:10:17,488:DEBUG:for user 0763de67-f61e-3f5d-90e7-518e69793954, returning timeseries <emission.storage.timeseries.builtin_timeseries.BuiltinTimeSeries object at 0x7fe123567910>
2016-08-15 16:10:17,488:DEBUG:curr_query = {'$or': [{'metadata.key': 'analysis/cleaned_section'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.start_ts': {'$lte': 1471277417, '$gte': 1471273743}}, sort_key = data.start_ts
2016-08-15 16:10:17,488:DEBUG:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/cleaned_section']
2016-08-15 16:10:17,488:DEBUG:finished querying values for []
2016-08-15 16:10:17,488:DEBUG:finished querying values for ['analysis/cleaned_section']
2016-08-15 16:10:17,490:DEBUG:Found 0 results
2016-08-15 16:10:17,490:DEBUG:Returning entry with length 0 result
2016-08-15 16:10:17,490:INFO:Found no entries for user 0763de67-f61e-3f5d-90e7-518e69793
954, time_query <emission.storage.timeseries.timequery.TimeQuery object at 0x7fe12526fd9
0>
2016-08-15 16:10:17,490:DEBUG:Metrics response: []

But if I re-run the query against the database right now, I get

In [15]: edb.get_analysis_timeseries_db().find({'$or': [{'metadata.key': 'analysis/cleaned_section'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.start_ts': {'$lte': 1471277417, '$gte': 1471273743}}).count()
Out[15]: 4

So clearly, there are sections now, but they weren't there when we made the query. This is more serious because the metrics use the section start time for their calculation, so even if we could make the architecture be more real-time, this would still fail for long, multi-hour trips (e.g. from Mountain View to Berkeley).

shankari commented 8 years ago

@juemura this is all yours. Let's talk about the fix once GRE is done.

shankari commented 1 year ago

We are not integrating with habitica any more

e-mission / e-mission-docs

Figure out whether or not the habitica game calculation should be a pipeline stage #183