Open atton16 opened 4 years ago
In order to understand the difference between the usercache and the timeseries, and the interaction between the two, please see section 3.5.1 and 5.1.2.1 of my thesis. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-180.html
It looks like manual/confirm_survey
is working properly and manual/user_profile_survey
is not. I don't see any differences between the two that could account for this. Are you sure that the object names match the keys generated on the phone?
The pipeline logs should have the error generated while processing the user_profile_survey
objects.
It is the exact match.
Now the new data from 2020-01-02
got pushed to user cache
again.
But the data from 2020-01-01
moved to timeseries_error
Not sure how long until the new data move to timeseries_error
Not sure how long until the new data move to timeseries_error
When the pipeline next runs. Check the pipeline logs.
Found it. It looks like the pipeline needs start_ts
and end_ts
field in the data. Luckily, manual/confirm_survey
has them to reference to the trip but manual/user_profile_survey
do not have them. Should it implement the fake start_ts
and end_ts
?
Pipeline log
Got error 'AttrDict' instance has no attribute 'start_ts' while saving entry AttrDict({'_id': ObjectId('5e0c91ecd7802ee78b6b2b22'), 'metadata': {'time_zone': 'Asia/Bangkok', 'plugin': 'none', 'write_ts': 1577867069.5477018, 'platform': 'ios', 'read_ts': 0, 'key': 'manual/user_profile_survey', 'type': 'message'}, 'user_id': UUID('4b38af07-4861-4d83-81fc-85baf68eefdb'), 'data': {'survey_result': '<data xmlns:jr="http://openrosa.org/javarosa" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" id="snapshot_xml">\n <start>2020-01-01T15:23:19.311+07:00</start>\n <end>2020-01-01T15:23:19.314+07:00</end>\n <today>2020-01-01</today>\n <deviceid>deviceid not found</deviceid>\n <group_uy6od86>\n <age>25_33</age>\n <gender>male</gender>\n <home_location_001>0</home_location_001>\n <employment>option_1</employment>\n <work_location>0</work_location>\n <study>study_full_time</study>\n <school_location>0</school_location>\n </group_uy6od86>\n <group_my6jo52>\n <hh_size>0</hh_size>\n <num_workers>0</num_workers>\n <num_adults>0</num_adults>\n <num_kids>0</num_kids>\n </group_my6jo52>\n <group_lm5fq00>\n <num_vehicles>0</num_vehicles>\n <driver_licence>no</driver_licence>\n <drive/>\n <Vehicle_year/>\n <veh_make/>\n <veh_model/>\n </group_lm5fq00>\n <meta>\n <instanceID>uuid:77f51391-8af7-47d2-8ded-d822a542ffdd</instanceID>\n </meta>\n </data>', 'user_uuid': '4b38af0748614d8381fc85baf68eefdb', 'timestamp': '2020-01-01T08:24:29.540Z'}}) -> None
There is also another error. It appears that xml stored in data has some unicode characters that the system does not happy to accept it.
Pipeline log
--- Logging error ---
Traceback (most recent call last):
File "/usr/src/app/emission/net/usercache/builtin_usercache_handler.py", line 79, in moveToLongTerm
unified_entry = enuf.convert_to_common_format(entry)
File "/usr/src/app/emission/net/usercache/formatters/formatter.py", line 13, in convert_to_common_format
return format_fn(entry)
File "/usr/src/app/emission/net/usercache/formatters/ios/user_profile_survey.py", line 12, in format
return fgl.format(entry)
File "/usr/src/app/emission/net/usercache/formatters/generic/userlabel.py", line 28, in format
fc.expand_start_end_data_times(data, metadata)
File "/usr/src/app/emission/net/usercache/formatters/common.py", line 31, in expand_start_end_data_times
d.start_local_dt = ecsdlq.get_local_date(d.start_ts, m.time_zone)
File "/opt/conda/envs/emission/lib/python3.6/site-packages/attrdict/mixins.py", line 82, in __getattr__
cls=self.__class__.__name__, name=key
AttributeError: 'AttrDict' instance has no attribute 'start_ts'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/emission/lib/python3.6/logging/__init__.py", line 994, in emit
stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 849-851: ordinal not in range(128)
Call stack:
File "<string>", line 1, in <module>
File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/spawn.py", line 118, in _main
return self._bootstrap()
File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/src/app/emission/pipeline/intake_stage.py", line 72, in run_intake_pipeline
run_intake_pipeline_for_user(uuid)
File "/usr/src/app/emission/pipeline/intake_stage.py", line 84, in run_intake_pipeline_for_user
uh.moveToLongTerm()
File "/usr/src/app/emission/net/usercache/builtin_usercache_handler.py", line 87, in moveToLongTerm
ts.insert_error(entry_doc)
File "/usr/src/app/emission/storage/timeseries/builtin_timeseries.py", line 375, in insert_error
logging.debug("Inserting entry %s into error timeseries" % entry)
Message: 'Inserting entry {\'_id\': ObjectId(\'5e0c4a98d7802ee78b6ad8a1\'), \'metadata\': {\'time_zone\': \'Australia/Sydney\', \'plugin\': \'none\', \'write_ts\': 1577860330.459893, \'platform\': \'ios\', \'read_ts\': 0, \'key\': \'manual/user_profile_survey\', \'type\': \'message\'}, \'user_id\': UUID(\'74fc6ede-ece0-43b5-ade3-456ebe034e3f\'), \'data\': {\'survey_result\': \'<data xmlns:jr="http://openrosa.org/javarosa" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" id="snapshot_xml">\\n <start>2020-01-01T17:31:32.724+11:00</start>\\n <end>2020-01-01T17:31:32.727+11:00</end>\\n <today>2020-01-01</today>\\n <deviceid>deviceid not found</deviceid>\\n <group_uy6od86>\\n <age>25_33</age>\\n <gender>female</gender>\\n <home_location_001>\u0e04\u0e48\u0e32</home_location_001>\\n <employment>option_2</employment>\\n <work_location>\u0e31\u0e19\u0e48</work_location>\\n <study>study_part_time</study>\\n <school_location>\u0e38\u0e36\u0e35</school_location>\\n </group_uy6od86>\\n <group_my6jo52>\\n <hh_size>6</hh_size>\\n <num_workers>1</num_workers>\\n <num_adults>2</num_adults>\\n <num_kids>4</num_kids>\\n </group_my6jo52>\\n <group_lm5fq00>\\n <num_vehicles>3</num_vehicles>\\n <driver_licence>no</driver_licence>\\n <drive/>\\n <Vehicle_year/>\\n <veh_make/>\\n <veh_model/>\\n </group_lm5fq00>\\n <meta>\\n <instanceID>uuid:2fc806f0-0467-44f9-9299-45c02c477962</instanceID>\\n </meta>\\n </data>\', \'user_uuid\': \'74fc6edeece043b5ade3456ebe034e3f\', \'timestamp\': \'2020-01-01T06:32:10.451Z\'}} into error timeseries'
Found it. It looks like the pipeline needs start_ts and end_ts field in the data. Luckily, manual/confirm_survey has them to reference to the trip but manual/user_profile_survey do not have them. Should it implement the fake start_ts and end_ts?
The pipeline doesn't need anything; the specific formatter that you are using emission.net.usercache.formatters.generic.userlabel
does. Just implement the format
method directly without expecting start_ts
and end_ts
instead of calling fgl.format(entry)
wrt the second error, you can set the logger to log in UTF-8 instead of ASCII. There is a prior issue filed with a suggested fix.
Found it. It looks like the pipeline needs start_ts and end_ts field in the data. Luckily, manual/confirm_survey has them to reference to the trip but manual/user_profile_survey do not have them. Should it implement the fake start_ts and end_ts?
The pipeline doesn't need anything; the specific formatter that you are using
emission.net.usercache.formatters.generic.userlabel
does. Just implement theformat
method directly without expectingstart_ts
andend_ts
instead of callingfgl.format(entry)
I see, thank you.
Yeah I wrote emission.net.usercache.formatters.generic.userlabel
because most of the userlabel
s had the same format, and I didn't want to copy/paste the same formatting code over and over. But if the userlabel
actually has a different format, implementing format
separately is fine.
FYI, the ASCII logging issue is https://github.com/e-mission/e-mission-docs/issues/444
I have implemented custom objects by following the instructions.
Check these commits: https://github.com/atton16/e-mission-server/commit/96480a51f03881063d8f11b718ba6daf4dffd0ca https://github.com/atton16/e-mission-server/commit/9ff5b15ec686b470d829a1eaf162710d7b292f13
The custom objects are stored in
usercache
on mongodb for a few days and then something weird happen.manual/user_profile_survey
moved fromusercache
totimeseries_error
.manual/confirm_survey
moved fromusercache
totimeseries
.I am not certain if
usercache
ortimeseries
is the right destination. Buttimeseries_error
definitely not.Please help clarify.