Closed PatGendre closed 5 years ago
Hi, yesterday we have hacked (but not properly corrected) the "é" encoding issue identified in issue #333 so as to transfer the data from the phone to the server,
@PatGendre did you do this by writing a script to parse the data from the usercacheDB
and save it to the database? If so, would you consider contributing it to the server code? It is a nice workaround/first step for issue #326; people could collect data for a few days, email their usercacheDB to themselves and load it on a server running on their own laptop and experiment with it.
I know that at least one project (the cci project) wanted to download user data onto researcher laptops (they were recruiting in person). It turns out that you can't actually export the database into an externally accessed file on iOS, so we dropped it. But a secondary reason was that we also didn't have the ability to import directly from the usercacheDB. If they had the ability then, they may have chosen to come up with some approach that included users emailing their data to them or something.
wrt your actual issue, note that there are two calls to /result/metrics/timestamp
. The first call, serviced in thread 140572423100160
was for your aggregate metrics. It returned no data.
2019-03-19 11:05:18,880:DEBUG:140572423100160:START POST /result/metrics/timestamp
2019-03-19 11:05:18,881:DEBUG:140572423100160:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2019-03-19 11:05:18,881:DEBUG:140572423100160:Using the skip method to verify id token patgendre of length 9
2019-03-19 11:05:18,882:DEBUG:140572423100160:retUUID = 91bc554a-4228-4afd-b2ca-be8eb7e33760
2019-03-19 11:05:18,883:DEBUG:140572423100160:metric_list = ['duration', 'median_speed', 'count', 'distance']
2019-03-19 11:05:18,883:DEBUG:140572423100160:['duration -> <function get_duration at 0x7fd991e040d0>', 'median_speed -> <function get_median_speed at 0x7fd991e04158>', 'count -> <function get_count at 0x7fd991e7cf28>', 'distance -> <function get_distance at 0x7fd991e04048>']
2019-03-19 11:05:18,883:DEBUG:140572423100160:for user 91bc554a-4228-4afd-b2ca-be8eb7e33760, returning timeseries <emission.storage.timeseries.builtin_timeseries.BuiltinTimeSeries object at 0x7fd99050c470>
2019-03-19 11:05:18,886:DEBUG:140572423100160:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('91bc554a-4228-4afd-b2ca-be8eb7e33760'), '$or': [{'metadata.key': 'analysis/inferred_section'}], 'data.start_ts': {'$lte': 1551999600, '$gte': 1551654000}}, sort_key = data.start_ts
2019-03-19 11:05:18,888:DEBUG:140572423100160:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/inferred_section']
2019-03-19 11:05:19,119:DEBUG:140572423100160:finished querying values for [], count = 0
2019-03-19 11:05:19,125:DEBUG:140572423100160:finished querying values for ['analysis/inferred_section'], count = 0
2019-03-19 11:05:19,306:DEBUG:140572423100160:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
2019-03-19 11:05:19,306:DEBUG:140572423100160:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
2019-03-19 11:05:19,469:DEBUG:140572423100160:Found 0 results
2019-03-19 11:05:19,469:DEBUG:140572423100160:Returning entry with length 0 result
2019-03-19 11:05:19,470:INFO:140572423100160:Found no entries for user 91bc554a-4228-4afd-b2ca-be8eb7e33760, time_query <emission.storage.timeseries.timequery.TimeQuery object at 0x7fd9a163fe10>
2019-03-19 11:05:19,470:DEBUG:140572423100160:END POST /result/metrics/timestamp 91bc554a-4228-4afd-b2ca-be8eb7e33760 0.5899531841278076
Are you sure the timestamps are correct? I believe they are in UTC, although since you are in France, they should be pretty close to your actual time.
According to the query, the range that you were searching for was
In [2]: arrow.get(1551999600).to("Europe/Paris")
Out[2]: <Arrow [2019-03-08T00:00:00+01:00]>
In [3]: arrow.get(1551654000).to("Europe/Paris")
Out[3]: <Arrow [2019-03-04T00:00:00+01:00]>
Can you look through your data and confirm that you have analysis/inferred_section
objects in that range? The log has the concrete query that was run.
2019-03-19 11:05:18,886:DEBUG:140572423100160:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('91bc554a-4228-4afd-b2ca-be8eb7e33760'), '$or': [{'metadata.key': 'analysis/inferred_section'}], 'data.start_ts': {'$lte': 1551999600, '$gte': 1551654000}}, sort_key = data.start_ts
Hi, yesterday we have hacked (but not properly corrected) the "é" encoding issue identified in issue #333 so as to transfer the data from the phone to the server,
@PatGendre did you do this by writing a script to parse the data from the
usercacheDB
and save it to the database? If so, would you consider contributing it to the server code? It is a nice workaround/first step for issue #326; people could collect data for a few days, email their usercacheDB to themselves and load it on a server running on their own laptop and experiment with it.
Sorry,
it was really a dirty hack, adding b = b.replace(b'\xe9', b'e')
in bottle.py (just before return json_loads(b)
line 1294) so as to avoid the json exception...
Hi, I have no access to the app right now because I use the UNSW version but I checked in mongo, there is effectively no inferred sections from March 4 to March 8, and more generally no data in the analysis DB until March 18. But there is data the timeseries DB in early march, including locations, but the pipeline apparently did not transform these data towards the analysis DB.
So maybe when we re-run the pipeline I'll check again in the logs why the location data does not translate into analysis data for the first half of March...
@PatGendre yes.
As you saw from the pipeline documentation, the pipeline state keeps track of how far we have processed input data. If for some reason, your last_processed_ts
was mid-March, for example, the earlier data would not be processed. You have to reset the pipeline for that to happen.
And of course, if that is indeed the case, figure out why your last_processed_ts
was mid-March...
Ok, in the database the last_ts_run
is on March 26 for all 8 stages
and for the last_processed_ts
, it is March 21, except for 6 (ACCURACY_FILTERING) and 9 (OUTPUT_GEN) for which it is null
@PatGendre were you able to re-run the pipeline? Or if you can't reproduce, maybe close this issue?
@shankari I close the issue, difficult to re-run the pipeline so as to reproduce the issue as the data have been accidentally erased from mongodb on the 27 (Russian attack) :-( I'll try to check in the coming days if the metrics seem Ok.
Hi, yesterday we have hacked (but not properly corrected) the "é" encoding issue identified in issue #333 so as to transfer the data from the phone to the server, this worked and we could execute the pipeline and have analysis data (such as cleaned and inferred sections), visible in the diary and in mongo. On the app however, no metrics are shown for the newly imported days of location data.
here is the log :