Closed PatGendre closed 5 years ago
I followed the FAQ https://github.com/e-mission/e-mission-docs/blob/master/docs/e-mission-server/troubleshooting_tips_FAQ.md#confirm-that-the-entries-are-really-in-the-database and find 0 for len(data_list), which confirms what I've seen in mongo.
I've run the intake pipeline, there are no apparent errors except at the beginning :
(emission) root@grfmapfabmob:/var/emission/e-mission-server# ./e-mission-py.bash bin/debug/intake_single_user.py -e patgendre storage not configured, falling back to sample, default configuration Connecting to database URL localhost analysis.debug.conf.json not configured, falling back to sample, default configuration google maps key not configured, falling back to nominatim transit stops query not configured, falling back to default ERROR:root:habitica not configured, game functions not supported
(for habitica I guess it's normal we don't use it)
the first messages saying "falling back to default conf" seem also normal? At least I remember that similar messages were displayed when the pipeline actually produced data, back in End January.
another clue perhaps, there is only one intake_single.log file and an empty intake_singleerror.log file, no other intake.log files in /var/tmp (we use the default sample conf for intake logs) and there is not any data after February 7. I've run the intake pipeline several times manually since Feb 7, I can't find where the corresponding logs are...
couple of quick checks:
/usercache/put
for the user in the logs? If the data is not getting to the server, then we really need to look at the phone logs.another clue perhaps, there is only one intake_single.log file and an empty intake_single_error.log file,
debug
? If you are running the single user pipeline, then there should be only one log file, all runs get appended to that log file. If you are running the multi-user pipeline, do you see anything on the command line?yes I see /usercache/put in the logs example
2019-03-04 15:06:21,988[DEBUG-140479756744448]{emission/net/api/cfc_webapp.py@427}:START POST /usercache/put
2019-03-04 15:06:21,989[DEBUG-140479756744448]{emission/net/api/cfc_webapp.py@252}:Called userCache.put
2019-03-04 15:06:21,990[ERROR-140479756744448]{emission/net/api/cfc_webapp.py@436}:old style duration 0.4203517436981201 != timer based duration 0.0014113839715719223
2019-03-04 15:06:21,990[ERROR-140479756744448]{emission/net/api/cfc_webapp.py@436}:old style duration 0.4203517436981201 != timer based duration 0.0014113839715719223
2019-03-04 15:06:21,991[DEBUG-140479756744448]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@337}:insert called with entry of type <class 'emission.core.wrapper.entry.Entry'>
2019-03-04 15:06:21,991[DEBUG-140479756744448]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@346}:entry was fine, no need to fix it
2019-03-04 15:06:21,992[DEBUG-140479756744448]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@348}:Inserting entry Entry({'_id': ObjectId('5c7d30dd4e276dc733886f93'), 'user_id': '', 'metadata': Metadata({'key': 'stats/server_api_error', 'platform': 'server', 'write_ts': 1551708381.9908416, 'time_zone': 'America/Los_Angeles', 'write_local_dt': LocalDate({'year': 2019, 'month': 3, 'day': 4, 'hour': 6, 'minute': 6, 'second': 21, 'weekday': 0, 'timezone': 'America/Los_Angeles'}), 'write_fmt_time': '2019-03-04T06:06:21.990842-08:00'}), 'data': {'name': 'MISMATCH_POST_/usercache/put', 'ts': 1551708381.9902549, 'reading': 0.4189403597265482}}) into timeseries
I've found where the logs of the intake-multiprocess.py ; for whatever reason they have been logged in the file as the webserver.log !! The problem is the only data uploaded seems to be stats/server_api_time example here, there seems to be no data to process
2019-03-05 08:56:18,283[DEBUG-140619732752128]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@185}:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('91bc554a-4228-4afd-b2ca-be8eb7e33760'), '$or': [{'metadata.key': 'analysis/recreated_location'}], 'data.ts': {'$lte': 1548920763.074, '$gte': 1548919380.077}}, sort_key = data.ts
2019-03-05 08:56:18,283[DEBUG-140619732752128]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@188}:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/recreated_location']
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation_methods/dwell_segmentation_time_filter.py@129}:len(last10PointsDistances) = 10, len(last5MinsDistances) = 54
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation_methods/dwell_segmentation_time_filter.py@131}:last5MinsTimes.max() = 298.0, time_threshold = 300
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@185}:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('a224fa8a-3a86-4140-ae50-beab3f5cf70a'), '$or': [{'metadata.key': 'statemachine/transition'}], 'data.ts': {'$lte': 1551459107.0, '$gte': 1551459102.0}}, sort_key = data.ts
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@188}:orig_ts_db_keys = ['statemachine/transition'], analysis_ts_db_keys = []
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@232}:finished querying values for ['statemachine/transition'], count = 0
2019-03-05 08:56:18,299[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@232}:finished querying values for [], count = 0
2019-03-05 08:56:18,300[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@204}:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
2019-03-05 08:56:18,303[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@275}:Found 0 results
2019-03-05 08:56:18,303[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/analysis/intake/segmentation/restart_checking.py@27}:In range 1551459102.0 -> 1551459107.0 found no transitions
I've found where the logs of the intake-multiprocess.py ; for whatever reason they have been logged in the file as the webserver.log !!
Are you sure you didn't change the settings in conf/logs/intake*
. I can't think of any other reason why the intake logs should be in the webserver logs.
yes I see /usercache/put in the logs
Does the /usercache/put
list actual entries, similar to the example in the troubleshooting guide?
I don't see that in your example
2017-10-08 17:45:59,146:DEBUG:139922601957120:Updated result for user = ea59084e-11d4-40 76-9252-3b9a29ce35e0, key = background/location, write_ts = 1507481639.3 = {'updatedExisting': False, u'nModified': 0, u'ok': 1, u'upserted': ObjectId('59da6457cb17471ac0bc8f48 '), u'n': 1}
Are you sure you didn't change the settings in conf/logs/intake*. I can't think of any other reason why the intake logs should be in the webserver logs.
yes I am sure, the reason is not a good one, the was modified by hand in intake-multiprocess.py so as to redirect the logs to another file. We will correct this.
Does the /usercache/put list actual entries, similar to the example in the troubleshooting guide? I don't see that in your example
Yes /usercache/put list entries but for other users, for instance:
2019-03-01 17:50:10,782[DEBUG-140479748351744]{/var/emission/e-mission-server/emission/net/api/usercache.py@58}:Updated result for user = a224fa8a-3a86-4140-ae50-beab3f5cf70a, key = stats/client_nav_event, write_ts = 1551428317.537 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('5c7962c232df604a3f5ca0e6'), 'ok': 1.0, 'updatedExisting': False} 2019-03-01 17:50:10,783[DEBUG-140479748351744]{/var/emission/e-mission-server/emission/net/api/usercache.py@58}:Updated result for user = a224fa8a-3a86-4140-ae50-beab3f5cf70a, key = stats/client_nav_event, write_ts = 1551428318.819 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('5c7962c232df604a3f5ca0e8'), 'ok': 1.0, 'updatedExisting': False} ...
but for my uuid, I cannot match the string "Updated result for user" in the log file...
I looked at the scripts in the bin directory, there is a debug/fix_usercache_processing.py script, do you think it is worthwhile to execute it?
Also I wanted to execute the delete_user.py script, but there is No module named 'get_database'
; I suspect the line from get_database import get_uuid_db, get_profile_db, get_moves_db
should be corrected as something like from emission.core.get_database import get_uuid_db, get_profile_db, get_moves_db
? Does this script still work correctly?
I think you should first see if your data is getting to the server correctly because as I said, I saw connection errors in the phone logs that you sent earlier. Was the earlier usercache/put
that you found for your UUID? what does it look like if it doesn't have any Updated result for user
entries?
The usercache/put
was not for my UUID. For my UUID I've found no occurrence of the Updated result for user
in the log file. I've found occurrence of usercache/put
though, such as cited above https://github.com/e-mission/e-mission-docs/issues/333#issuecomment-470822627 .
But these lines are web server logs I guess, not pipeline logs (but there are in the same file as I told you, due a conf mistake)
I cheked again and confirm, there is no other occurrence of usercache/put for my UUID, except when the server receives stats/server_api_time data ; no timeserie data is transmitted when I look at the log file.
so then your data is not making it from the phone to the server. You can mail your phone logs to yourself and look at them, or browse the log directly on the phone (Profile -> Check log). If you "Force sync" and then look at the logs on the phone, you should get a quick sense of whether the sync is succeeding or not.
Ok, thanks ! I don't think the "force sync" work, I will try again. I have around 50000 records in the sql db, only INFO and DEBUG (no ERROR if this exists). Among them, only ca. 2000 background/locations and filtered_locations
Quick check: have you pulled the geofence changes and switched back to your own apk? If you are still on emTripLog with the interscity channel, then your data is coming to my server. Check the host that you are connected to (Profile -> Developer Zone -> Sync -> host).
Also, just to clarify, you can browse logs also in Profile -> Developer Zone -> Check Logs. I forgot to put "Developer Zone" in https://github.com/e-mission/e-mission-docs/issues/333#issuecomment-471001141
the host is http://145.239.101.212:8080 , so I suppose it is the good one. The for Sync doesn't seem to work, I confirm, because I don't see it in the webserver log.
in the phone log, there is an ERROR serverSyncAdapter IO Error ... connection refused while posting converted trips to JSON
so you are having some issues with connectivity?! are you able to access that webserver from (say) the browser on the phone?
yes i can see the web pages when entering the URL in firefox. and as i said at least 2 other users could send their trip data to mongodb
Le ven. 8 mars 2019 18:25, shankari notifications@github.com a écrit :
so you are having some issues with connectivity?! are you able to access that webserver from (say) the browser?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/e-mission/e-mission-docs/issues/333#issuecomment-471008362, or mute the thread https://github.com/notifications/unsubscribe-auth/AEzlV8-m9xpqEQC0TRG0yLAPll3mqRaMks5vUpz3gaJpZM4bjN7C .
I would just follow standard debugging steps:
There's a lot of potential causes for the server being inaccessible on stackoverflow (e.g. no INTERNET
permission in the app, no support for proxy...).
Ok, thank you! I will try to understand what is wrong. Note that although I have in the phone log this serverSyncAdapter IO Error, I have also a DEBUG message saying "js : local and native values match, already sinced" and I have some 200 HTTP INFO messages, as well and INFO message "CommunicationHelper : Got response ... with status HTTP 500", so the server is not completely unaccessible, it may depend on the request and on the payload. I will try to have a deeper look and give you additional feedback until Monday.
"js : local and native values match, already sinced"
That is just for the javascript local storage and the native database plugin
CommunicationHelper : Got response ... with status HTTP 500
Yeah this seems to indicate that you are connecting to the server but getting an error. Since you are building the app, you can also run in a debugger. All communication with the server that is related to you personally (as opposed to the aggregate) should go through https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L76
As we already discussed, some aggregate calls go through $http
, but none of the user calls do.
thanks! I will try with Yann on Monday, he knows Android Studio and so the debugger too, I guess.
Thanks to some interactive debugging and looking at the webserver logs, we now know the issue has to do with encoding février at some point along the collection and sync path. The problem occurs in usercache/put, no data is sent because the JSON payload is not valid, in fact there is an encoding error, the error is due to the character "\xe9" several places in the file in the field "fmt_time":"28 f\xe9vr. 2019 16:14:54" , the month is févr. in French with an accent. Once the encoding is corrected in the app, the JSON payload should be put correctly towards the server (the file is 1.4Mb).
let's try to debug where we are not using UTF-8 along the data collection and sync path. First, is the phone saving the data correctly as UTF-8? If you look at the userCacheDB
, how is févr.
encoded?
hi, in the sqlite usercachedb, I have for instance {"accuracy":95.787,"altitude":102.0,"bearing":0.0,"elapsedRealtimeNanos":619090000000,"filter":"time","fmt_time":"28 févr. 2019 16:14:54","latitude":48.8397591,"longitude":2.3368957,"sensed_speed":0.0,"ts":1.551366894194E9}
I don't know how to check the encoding but the "é" string is shown (correctly) as "é" not as "\xe9" The data field of table usercache is of type TEXT.
so this seems overall consistent with the other stuff you have noticed. basically, the python implementation is not UTF-8 consistent, at least on your server. I am going to manually set the date/time and locale in the emulator and see if this is an overall issue.
ok, so I tried to reproduce this in the emulator and it is not working because the location points are sent to the emulator by the laptop and that is still set to March 26.
sqlite> select data from userCache where key="background/filtered_location
{"accuracy":20.0,"altitude":5.0,"bearing":0.0,"elapsedRealtimeNanos":115739073019,"filter":"time","fmt_time":"26 mars 2019 8:16:03 PM","latitude":38.1219983,"longitude":-131.5999983,"sensed_speed":0.0,"ts":1.553627763E9
{"accuracy":20.0,"altitude":5.0,"bearing":0.0,"elapsedRealtimeNanos":178993731763,"filter":"time","fmt_time":"26 mars 2019 8:17:06 PM","latitude":38.322,"longitude":-131.8,"sensed_speed":0.0,"ts":1.553627826E9
{"accuracy":20.0,"altitude":5.0,"bearing":0.0,"elapsedRealtimeNanos":210005651859,"filter":"time","fmt_time":"26 mars 2019 8:17:37 PM","latitude":38.322,"longitude":-131.8,"sensed_speed":0.0,"ts":1.553627857E9}
I really don't want to reset my laptop datetime to Feb. Let's try on a physical device instead...
If that still doesn't work, I'll have to try some other python scripts because it looks like we won't be able to reproduce this until August (Août)
ok, so it looks like on the physical device, the first entry was still mar but after that, we are getting entries in feb. Note that elapsedRealtimeNanos
for the last entry is after the value for the first two entries.
{"accuracy":15.048,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2419284455000000,"filter":"time","fmt_time":"5 févr. 2019 9:41:27 PM","latitude":37.3910475,"longitude":-122.0864541,"sensed_speed":0.0,"ts":1.549399287295E9}
{"accuracy":14.506,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2419314553000000,"filter":"time","fmt_time":"5 févr. 2019 9:41:57 PM","latitude":37.3910476,"longitude":-122.0864542,"sensed_speed":0.0,"ts":1.549399317393E9}
{"accuracy":19.0,"altitude":-15.0,"bearing":0.0,"elapsedRealtimeNanos":2419347959564180,"filter":"time","fmt_time":"26 mars 2019 8:42:32 PM","latitude":37.3911112,"longitude":-122.0864028,"sensed_speed":0.0,"ts":1.553629352E9}
{"accuracy":15.0,"altitude":-16.0,"bearing":0.0,"elapsedRealtimeNanos":2419378059196564,"filter":"time","fmt_time":"26 mars 2019 8:43:02 PM","latitude":37.3911024,"longitude":-122.086385,"sensed_speed":0.0,"ts":1.553629382E9}
hm, trip just ended, and the resulting points are some kind of mix of feb and mar
sqlite> select data from userCache where key="background/filtered_location";
{"accuracy":15.048,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2419284455000000,"filter":"time","fmt_time":"5 févr. 2019 9:41:27 PM","latitude":37.3910475,"longitude":-122.0864541,"sensed_speed":0.0,"ts":1.549399287295E9}
{"accuracy":14.506,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2419314553000000,"filter":"time","fmt_time":"5 févr. 2019 9:41:57 PM","latitude":37.3910476,"longitude":-122.0864542,"sensed_speed":0.0,"ts":1.549399317393E9}
{"accuracy":19.0,"altitude":-15.0,"bearing":0.0,"elapsedRealtimeNanos":2419347959564180,"filter":"time","fmt_time":"26 mars 2019 8:42:32 PM","latitude":37.3911112,"longitude":-122.0864028,"sensed_speed":0.0,"ts":1.553629352E9}
{"accuracy":15.0,"altitude":-16.0,"bearing":0.0,"elapsedRealtimeNanos":2419378059196564,"filter":"time","fmt_time":"26 mars 2019 8:43:02 PM","latitude":37.3911024,"longitude":-122.086385,"sensed_speed":0.0,"ts":1.553629382E9}
{"accuracy":18.0,"altitude":-11.0,"bearing":0.0,"elapsedRealtimeNanos":2419408056544886,"filter":"time","fmt_time":"26 mars 2019 8:43:32 PM","latitude":37.3910955,"longitude":-122.0864123,"sensed_speed":0.0,"ts":1.553629412E9}
{"accuracy":21.0,"altitude":-12.0,"bearing":0.0,"elapsedRealtimeNanos":2419438067373000,"filter":"time","fmt_time":"26 mars 2019 8:44:02 PM","latitude":37.3911042,"longitude":-122.086441,"sensed_speed":0.0,"ts":1.553629442E9}
{"accuracy":13.0,"altitude":-11.0,"bearing":0.0,"elapsedRealtimeNanos":2419467037520749,"filter":"time","fmt_time":"26 mars 2019 8:44:31 PM","latitude":37.3911024,"longitude":-122.0864414,"sensed_speed":0.0,"ts":1.553629471E9}
{"accuracy":14.0,"altitude":-12.0,"bearing":0.0,"elapsedRealtimeNanos":2419497046126571,"filter":"time","fmt_time":"26 mars 2019 8:45:01 PM","latitude":37.3910905,"longitude":-122.086435,"sensed_speed":0.0,"ts":1.553629501E9}
{"accuracy":7.0,"altitude":-16.0,"bearing":0.0,"elapsedRealtimeNanos":2419528043401091,"filter":"time","fmt_time":"26 mars 2019 8:45:32 PM","latitude":37.3910864,"longitude":-122.0864314,"sensed_speed":0.0,"ts":1.553629532E9}
{"accuracy":15.0,"altitude":-44.0,"bearing":0.0,"elapsedRealtimeNanos":2419557064059986,"filter":"time","fmt_time":"26 mars 2019 8:46:01 PM","latitude":37.3911584,"longitude":-122.0862573,"sensed_speed":0.0,"ts":1.553629561E9}
{"accuracy":14.568,"altitude":-4.300000190734863,"bearing":0.0,"elapsedRealtimeNanos":2419586132000000,"filter":"time","fmt_time":"5 févr. 2019 9:46:28 PM","latitude":37.3910475,"longitude":-122.0864542,"sensed_speed":0.0,"ts":1.549399588972E9}
But there are some févr
entries, so we should be able to test this.
ok so this is reproducible even on my laptop, and I suspect, on the main e-mission server as well.
First, I added the additional logging from our chat with @PatGendre
diff --git a/emission/net/api/bottle.py b/emission/net/api/bottle.py
index c556baf6..fc8386c1 100644
--- a/emission/net/api/bottle.py
+++ b/emission/net/api/bottle.py
@@ -1285,15 +1285,22 @@ class BaseRequest(object):
are processed to avoid memory exhaustion.
Invalid JSON raises a 400 error response.
"""
+ import logging
+ logging.debug('@' * 10 + ' JSON');
ctype = self.environ.get('CONTENT_TYPE', '').lower().split(';')[0]
if ctype in ('application/json', 'application/json-rpc'):
+ logging.debug("ctype = %s" % ctype)
b = self._get_body_string()
+ logging.debug("body = %s" % b)
if not b:
+ logging.debug('@' * 10 + ' JSON RETURN NONE 1')
return None
try:
return json_loads(b)
except (ValueError, TypeError):
+ logging.debug('@' * 10 + ' HTTPError')
raise HTTPError(400, 'Invalid JSON')
+ logging.debug('@' * 10 + ' JSON RETURN NONE 1')
return None
def _iter_body(self, read, bufsize):
Then, I force synced, and got an error on the phone
02-05 22:01:20.079 13140-13226/edu.berkeley.eecs.emission D/ConnectionSettings: in getConnectURL, returning http://10.0.0.7:8080
02-05 22:01:20.079 13140-13226/edu.berkeley.eecs.emission I/System.out: Posting data to http://10.0.0.7:8080/usercache/put
02-05 22:01:20.267 13140-13226/edu.berkeley.eecs.emission I/System.out: Got response org.apache.http.message.BasicHttpResponse@18081a5 with status HTTP/1.1 400 Bad Request
02-05 22:01:20.274 13140-13226/edu.berkeley.eecs.emission I/CommunicationHelper: Got response org.apache.http.message.BasicHttpResponse@18081a5 with statusHTTP/1.1 400 Bad Request
02-05 22:01:20.281 13140-13226/edu.berkeley.eecs.emission E/ServerSyncAdapter: IO Error java.io.IOException while posting converted trips to JSON
02-05 22:01:20.295 13140-13226/edu.berkeley.eecs.emission D/BuiltinUserCache: Added value for key stats/client_error at time 1.549400480284E9
and looking at the webserver logs, the data is in fact read as non-UTF8
{"metadata":{"key":"background\\/location","platform":"android","read_ts":0,"time_zone":"Europe\\/Brussels","type":"sensor-data","write_ts":1.549399287465E9},"data":{"accuracy":15.048,"altitude":-4.099999904632568,"bearing":0,"elapsedRealtimeNanos":2419284455000000,"filter":"time","fmt_time":"5 f\xe9vr. 2019 9:41:27 PM","latitude":37.3910475,"longitude": 122.0864541,"sensed_speed":0,"ts":1.549399287295E9}
ok so this is going to be complicated to debug. Basically, the \xe9
encoding for é is part of the 8859 (or Latin-1) encoding (https://en.wikipedia.org/wiki/ISO/IEC_8859-1). And apparently:
ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/" (HTML5 changed this to Windows-1252).[1][2]
so maybe the real issue is that the data is coming in with ISO-8859-1 but python expects UTF-8?
Yes, that seems to be the issue. I created two files - iso-8859.txt
, which has the \xe9
encoding, and utf-8.txt
, which has it replaced by é
(see attached).
Then I tried to load them using json.load
which is what bottle does. The UTF-8 one loads fine, while the iso-8859.txt
one gives an error at character 297, which is essentially the \
In [1]: import json
In [2]: json.load(open("/tmp/utf-8.txt"))
Out[2]:
{'data': {'accuracy': 15.048,
'altitude': -4.099999904632568,
'bearing': 0,
'elapsedRealtimeNanos': 2419284455000000,
'filter': 'time',
'fmt_time': '5 févr. 2019 9:41:27 PM',
'latitude': 37.3910475,
'longitude': -122.0864541,
'sensed_speed': 0,
'ts': 1549399287.295},
'metadata': {'key': 'background\\/location',
'platform': 'android',
'read_ts': 0,
'time_zone': 'Europe\\/Brussels',
'type': 'sensor-data',
'write_ts': 1549399287.465}}
In [3]: json.load(open("/tmp/iso-8859.txt"))
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-3-ef49a420b762> in <module>()
----> 1 json.load(open("/tmp/iso-8859.txt"))
...
JSONDecodeError: Invalid \escape: line 1 column 298 (char 297)
so in python3, JSON is supposed to be represented in UTF-*
https://docs.python.org/3.6/library/json.html#character-encodings
The RFC requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.
other encodings are supported when bytes
are converted to strings
https://docs.python.org/3.6/library/stdtypes.html#str
Note that the body is in fact in bytes (this is the informal string representation starting with b'
)
body = b'{"user":"test_french_feb","phone_to_server":[{"metadata":{"key":"stats\\/client_nav_event",
bottle does support python3 https://github.com/e-mission/e-mission-server/blob/master/emission/net/api/bottle.py#L113 so I am not quite sure exactly why this is not already handled
ok so b = self._get_body_string()
is a byte string, which is why json_loads(b)
fails for non-UTF-8. But why doesn't get_body_string()
decode the body properly?
because JSON requests are supposed to be in UTF-8 https://stackoverflow.com/questions/29761905/default-encoding-of-http-post-request-with-json-body
Note that even the standard https://github.com/e-mission/e-mission-docs/issues/333#issuecomment-476837004 says that we should use latin-1 only for text
outputs.
So the real problem appears to be that the app is encoding the data in latin-1
(e.g. ISO-8859) instead of UTF-8.
There are two possible fixes:
The principled fix is better because ISO-8859 doesn't even support the full latin alphabet (https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Languages_with_incomplete_coverage), let alone other languages.
I will do a quick verification that the cheap fix will in fact work, though, since we may run out of time for the principled fix.
Hm, I am a little confused. If I try to generate the iso-8859 string file by encoding the utf-8 one
In [25]: open("/tmp/utf-8.txt").read().encode("latin1")
Out[25]: b'{"metadata":{"key":"background\\\\/location","platform":"android","read_ts":0,"time_zone":"Europe\\\\/Brussels","type":"sensor-data","write_ts":1.549399287465E9},"data":{"accuracy":15.048,"altitude":-4.099999904632568,"bearing":0,"elapsedRealtimeNanos":2419284455000000,"filter":"time","fmt_time":"5 f\xe9vr. 2019 9:41:27 PM","latitude":37.3910475,"longitude":-122.0864541,"sensed_speed":0,"ts":1.549399287295E9}}\n'
In [26]: encoded_bytes = open("/tmp/utf-8.txt").read().encode("latin1")
In [28]: open("/tmp/iso-8859.txt", "wb").write(encoded_bytes)
Out[28]: 407
The file still shows the fmt_time
as 5 févr. 2019 9:41:27 PM
ok, that's just vim being smart and figuring out the encoding automatically. If I use cat
instead, I get
{"metadata":{"key":"background\\/location","platform":"android","read_ts":0,"time_zone":"Europe\\/Brussels","type":"sensor-data","write_ts":1.549399287465E9},"data":{"accuracy":15.048,"altitude":-4.099999904632568,"bearing":0,"elapsedRealtimeNanos":2419284455000000,"filter":"time","fmt_time":"5 f�vr. 2019 9:41:27 PM","latitude":37.3910475,"longitude":-122.0864541,"sensed_speed":0,"ts":1.549399287295E9}}
ok, so the previous iso-8859.txt
file that I created by copy-pasting from the raw binary is actually wrong. This is the correct one.
iso-8859.txt
When I try to load it directly, I still get an error
In [30]: json.load(open("/tmp/iso-8859.txt"))
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-30-ef49a420b762> in <module>()
----> 1 json.load(open("/tmp/iso-8859.txt"))
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 297: invalid continuation byte
but if I try to load it with the proper encoding it works fine
In [31]: json.load(open("/tmp/iso-8859.txt", encoding='latin1'))
Out[31]:
{'data': {'accuracy': 15.048,
'altitude': -4.099999904632568,
'bearing': 0,
'elapsedRealtimeNanos': 2419284455000000,
'filter': 'time',
'fmt_time': '5 févr. 2019 9:41:27 PM',
'latitude': 37.3910475,
'longitude': -122.0864541,
'sensed_speed': 0,
'ts': 1549399287.295},
'metadata': {'key': 'background\\/location',
'platform': 'android',
'read_ts': 0,
'time_zone': 'Europe\\/Brussels',
'type': 'sensor-data',
'write_ts': 1549399287.465}}
and the bottle trick of decoding and encoding also works
In [32]: json.loads(open("/tmp/iso-8859.txt", mode="rb").read().decode('latin1').encode(
...: 'utf-8'))
Out[32]:
{'data': {'accuracy': 15.048,
'altitude': -4.099999904632568,
'bearing': 0,
'elapsedRealtimeNanos': 2419284455000000,
'filter': 'time',
'fmt_time': '5 févr. 2019 9:41:27 PM',
'latitude': 37.3910475,
'longitude': -122.0864541,
'sensed_speed': 0,
'ts': 1549399287.295},
'metadata': {'key': 'background\\/location',
'platform': 'android',
'read_ts': 0,
'time_zone': 'Europe\\/Brussels',
'type': 'sensor-data',
'write_ts': 1549399287.465}}
So the following one line fix is a cheap fix that we can fall back to...
- b = self._get_body_string()
+ b = self._get_body_string().decode('latin1').encode('utf-8')
Tried it with force sync.
/usercache/put
was now successful.
START 2019-03-26 14:44:05.756223 POST /usercache/put
END 2019-03-26 14:44:05.800609 POST /usercache/put 39b0541a-79fa-4655-a6ec-d9fe09944912 0.044220924377441406
2019-03-26 14:44:05,759:DEBUG:123145509371904:START POST /usercache/put
2019-03-26 14:44:05,766:DEBUG:123145509371904:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2019-03-26 14:44:05,766:DEBUG:123145509371904:Using the skip method to verify id token test_french_feb of length 15
2019-03-26 14:44:05,768:DEBUG:123145509371904:retUUID = 39b0541a-79fa-4655-a6ec-d9fe09944912
and in fact, the data is in the database
In [2]: from uuid import UUID
In [3]: test_uuid = UUID("39b0541a-79fa-4655-a6ec-d9fe09944912")
In [4]: import emission.storage.timeseries.cache_series as esdc
In [5]: data_list = esdc.find_entries(test_uuid, time_query=None)
In [6]: len(data_list)
Out[6]: 68
In [8]: location_entries = [e for e in data_list if e["metadata"]["key"] == "background/
...: location"]
In [10]: [e["data"]["fmt_time"] for e in location_entries]
Out[10]:
['5 févr. 2019 9:41:27 PM',
'5 févr. 2019 9:41:57 PM',
'26 mars 2019 8:42:32 PM',
'26 mars 2019 8:43:02 PM',
'26 mars 2019 8:43:32 PM',
'26 mars 2019 8:44:02 PM',
'26 mars 2019 8:44:31 PM',
'26 mars 2019 8:45:01 PM',
'26 mars 2019 8:45:32 PM',
'26 mars 2019 8:46:01 PM',
'5 févr. 2019 9:46:28 PM']
ok, so the server communication is in drum roll (https://github.com/e-mission/cordova-server-communication).
Looking more closely at the code, the only method that is really called externally (from
https://github.com/e-mission/cordova-server-communication/blob/master/www/servercomm.js) is pushGetJSON
pushGetJSON
uses the legacy apache HTTP library (https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L82, https://hc.apache.org/)
And yes, the default encoding is 8859-1
https://hc.apache.org/httpclient-legacy/charencodings.html
The request or response body can be any encoding, but by default is ISO-8859-1.
We can set the charset in the content-type header
The encoding may be specified in the Content-Type header, for example: Content-Type: text/html; charset=UTF-8 You can set the content type header for a request with the addRequestHeader method in each method
But we currently don't. https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L84
So it sounds like the simple fix is to just set it there :)
We should also seriously consider upgrading this plugin. Even the apache HTTP components are currently at 4.5, and the legacy version that we are using is EOL. And there are new android-specific options such as volley.
However, this is not the biggest priority right now (as it wasn't 3 years ago https://github.com/e-mission/cordova-server-communication/commit/0ff1f058d091a5d6b92d089374a8f47b0367b3cb#diff-b45f027217ab8fcefc52d843dfc2a781) so will punt again.
But if it turns out that we touch this code more frequently (e.g. every 6 mos or so), we should revisit that decision.
Retrying with the fix; again, we have a mix of entries, but we do have some with févr
sqlite> select data from userCache where key="background/filtered_location";
{"accuracy":15.058,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431578187000000,"filter":"time","fmt_time":"6 févr. 2019 1:06:21 AM","latitude":37.391058,"longitude":-122.086473,"sensed_speed":0.0,"ts":1.549411581027E9}
{"accuracy":14.898,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431608375000000,"filter":"time","fmt_time":"6 févr. 2019 1:06:51 AM","latitude":37.3910585,"longitude":-122.0864731,"sensed_speed":0.0,"ts":1.549411611215E9}
{"accuracy":14.871,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431638387000000,"filter":"time","fmt_time":"6 févr. 2019 1:07:21 AM","latitude":37.3910585,"longitude":-122.0864731,"sensed_speed":0.0,"ts":1.549411641227E9}
{"accuracy":14.983,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431669951000000,"filter":"time","fmt_time":"6 févr. 2019 1:07:52 AM","latitude":37.3910586,"longitude":-122.0864732,"sensed_speed":0.0,"ts":1.549411672791E9}
{"accuracy":96.0,"altitude":21.0,"bearing":0.0,"elapsedRealtimeNanos":2431701595115718,"filter":"time","fmt_time":"27 mars 2019 12:08:25 AM","latitude":37.3910394,"longitude":-122.0865538,"sensed_speed":0.0,"ts":1.553641705E9}
{"accuracy":15.0,"altitude":26.0,"bearing":0.0,"elapsedRealtimeNanos":2431731591314977,"filter":"time","fmt_time":"27 mars 2019 12:08:55 AM","latitude":37.3910463,"longitude":-122.0865623,"sensed_speed":0.0,"ts":1.553641735E9}
{"accuracy":15.083,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431760657000000,"filter":"time","fmt_time":"6 févr. 2019 1:09:23 AM","latitude":37.3910586,"longitude":-122.0864732,"sensed_speed":0.0,"ts":1.549411763497E9}
{"accuracy":30.0,"altitude":26.0,"bearing":0.0,"elapsedRealtimeNanos":2431792598220996,"filter":"time","fmt_time":"27 mars 2019 12:09:56 AM","latitude":37.3910553,"longitude":-122.0865681,"sensed_speed":0.0,"ts":1.553641796E9}
{"accuracy":24.0,"altitude":26.0,"bearing":0.0,"elapsedRealtimeNanos":2431822598976088,"filter":"time","fmt_time":"27 mars 2019 12:10:26 AM","latitude":37.3910554,"longitude":-122.0865628,"sensed_speed":0.0,"ts":1.553641826E9}
{"accuracy":19.0,"altitude":28.0,"bearing":0.0,"elapsedRealtimeNanos":2431850568434046,"filter":"time","fmt_time":"27 mars 2019 12:10:54 AM","latitude":37.3910573,"longitude":-122.0865693,"sensed_speed":0.0,"ts":1.553641854E9}
{"accuracy":15.267,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431879920000000,"filter":"time","fmt_time":"6 févr. 2019 1:11:22 AM","latitude":37.3910578,"longitude":-122.086473,"sensed_speed":0.0,"ts":1.54941188276E9}
We revert the workaround
$ git checkout emission/net/api/bottle.py
and force sync aaaannnndddd... the fix didn't work.
on the phone
02-06 01:15:56.268 25619-25683/edu.berkeley.eecs.emission D/ConnectionSettings: in getConnectURL, returning http://10.0.0.7:8080
02-06 01:15:56.269 25619-25683/edu.berkeley.eecs.emission I/System.out: Posting data to http://10.0.0.7:8080/usercache/put
02-06 01:15:56.329 25619-25683/edu.berkeley.eecs.emission I/System.out: Got response org.apache.http.message.BasicHttpResponse@f6ff8d6 with status HTTP/1.1 400 Bad Request
02-06 01:15:56.335 25619-25683/edu.berkeley.eecs.emission I/CommunicationHelper: Got response org.apache.http.message.BasicHttpResponse@f6ff8d6 with statusHTTP/1.1 400 Bad Request
02-06 01:15:56.342 25619-25683/edu.berkeley.eecs.emission E/ServerSyncAdapter: IO Error java.io.IOException while posting converted trips to JSON
so it turns out that that in addition to the HTTP headers, we also need to set the charset of the string entity that we use to generate the string body of the HTTP request. https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L89
Changing all the instances of msg.setEntity(new StringEntity(....toString()));
to
StringEntity se = new StringEntity(....toString())
msg.setEntity(se);
seems to fix the problem.
On the phone
02-06 01:37:10.860 5140-5484/edu.berkeley.eecs.emission D/ConnectionSettings: in getConnectURL, connectionSettings = {"connectUrl":"http:\/\/10.0.0.7:8080","android":{"auth":{"method":"prompted-auth","clientID":"ignored"}},"ios":{"auth":{"method":"prompted-auth","clientID":"ignored"}}}
02-06 01:37:10.863 5140-5484/edu.berkeley.eecs.emission D/ConnectionSettings: in getConnectURL, returning http://10.0.0.7:8080
02-06 01:37:10.864 5140-5484/edu.berkeley.eecs.emission I/System.out: Posting data to http://10.0.0.7:8080/usercache/put
02-06 01:37:14.639 5140-5484/edu.berkeley.eecs.emission I/System.out: Got response org.apache.http.message.BasicHttpResponse@f40d92d with status HTTP/1.1 200 OK
02-06 01:37:14.650 5140-5484/edu.berkeley.eecs.emission I/CommunicationHelper: Got response org.apache.http.message.BasicHttpResponse@f40d92d with statusHTTP/1.1 200 OK
on the server
2019-03-26 16:37:15,299:DEBUG:123145562148864:START POST /usercache/put
2019-03-26 16:37:15,299:DEBUG:123145562148864:Called userCache.put
2019-03-26 16:37:15,302:DEBUG:123145562148864:methodName = skip, returning <class 'emiss
ion.net.auth.skip.SkipMethod'>
2019-03-26 16:37:15,303:DEBUG:123145562148864:Using the skip method to verify id token t
est_french_feb of length 15
2019-03-26 16:37:15,304:DEBUG:123145562148864:retUUID = 39b0541a-79fa-4655-a6ec-d9fe0994
4912
2019-03-26 16:37:15,304:DEBUG:123145562148864:user_uuid 39b0541a-79fa-4655-a6ec-d9fe0994
4912
2019-03-26 16:37:15,307:DEBUG:123145562148864:Updated result for user = 39b0541a-79fa-46
55-a6ec-d9fe09944912, key = stats/client_nav_event, write_ts = 1549411452.059 = {'n': 1,
'nModified': 0, 'upserted': ObjectId('5c9ab7ab40ff1ac9196893da'), 'ok': 1.0, 'updatedEx
isting': False}
...
2019-03-26 16:37:15,332:DEBUG:123145562148864:Updated result for user = 39b0541a-79fa-46
55-a6ec-d9fe09944912, key = statemachine/transition, write_ts = 1549411883.789 = {'n': 1
, 'nModified': 0, 'upserted': ObjectId('5c9ab7ab40ff1ac919689422'), 'ok': 1.0, 'updatedE
xisting': False}
2019-03-26 16:37:15,333:DEBUG:123145562148864:END POST /usercache/put 39b0541a-79fa-4655
-a6ec-d9fe09944912 0.041570186614990234
@PatGendre since you are building your own app, do you want to try out the fix first? The patch is attached utf-8-fix.patch.gz
Alternatively, I can just check this in and bump up the version number, so when you rebuild, you will get the fix. I'm going to wait a bit to push a new emTripLog release with this fix because the Indian team is starting their survey; they just finished the pilot testing and they are going to be paranoid about any change to the app.
And this fix doesn't really affect them.
Let me know if you have any concerns.
@PatGendre Great catch BTW. This was such a corner case if you consider English and Spanish-speaking countries, but such an important component if you consider the rest of the world, specially countries with non-latin alphabets.
Let's briefly check to see whether there is a similar issue with iOS. Here's where we communicate with the server in iOS.
and it's true that we don't set UTF-8 as the encoding.
[request setValue:@"application/json"
forHTTPHeaderField:@"Content-Type"];
But to fill in the data, we use
NSData *jsonData = [NSJSONSerialization dataWithJSONObject:self.mJsonDict
options:kNilOptions
error:&parseError];
and that will return UTF-8 https://developer.apple.com/documentation/foundation/nsjsonserialization/1413636-datawithjsonobject
JSON data for obj, or nil if an internal error occurs. The resulting data is encoded in UTF-8.
So I should put in the charset=UTF-8
to be on the safe side, but I bet this works just fine on iOS.
@PatGendre are any of you testing with iPhone just to confirm?
Hi @shankari , great debug job :-) Internationalisation is definitely an art! I will check with Yann for patching the plugin but as the next problematic month for us is August (août) we can safely wait for the next emtripLog release. As for iOS, no we don't currently test on iPhones. The FabMob association has just asked for a Apple account so we may compile the app for iOS and test at the earliest next week.
I could not find a similar issue so I open this one: I have recorded tracks for the last few days, but no data is visible in mongo, except stats/server_api_time. There is no background/location nor background/filtered_location. And therefore no analysis data either. I checked the server log, for 2 other users there is some location data in mongo. In the server log, I see 2 kinds of error :