e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Custom object saved to an unexpected collection. #487

Open atton16 opened 4 years ago

atton16 commented 4 years ago

I have implemented custom objects by following the instructions.

Check these commits: https://github.com/atton16/e-mission-server/commit/96480a51f03881063d8f11b718ba6daf4dffd0ca https://github.com/atton16/e-mission-server/commit/9ff5b15ec686b470d829a1eaf162710d7b292f13

The custom objects are stored in usercache on mongodb for a few days and then something weird happen.

I am not certain if usercache or timeseries is the right destination. But timeseries_error definitely not.

Please help clarify.

shankari commented 4 years ago

In order to understand the difference between the usercache and the timeseries, and the interaction between the two, please see section 3.5.1 and 5.1.2.1 of my thesis. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-180.html

It looks like manual/confirm_survey is working properly and manual/user_profile_survey is not. I don't see any differences between the two that could account for this. Are you sure that the object names match the keys generated on the phone?

The pipeline logs should have the error generated while processing the user_profile_survey objects.

atton16 commented 4 years ago

It is the exact match.

Screen Shot 2563-01-03 at 00 17 19
atton16 commented 4 years ago

Now the new data from 2020-01-02 got pushed to user cache again.

Screen Shot 2563-01-03 at 00 19 54

But the data from 2020-01-01 moved to timeseries_error

Screen Shot 2563-01-03 at 00 19 47

Not sure how long until the new data move to timeseries_error

shankari commented 4 years ago

Not sure how long until the new data move to timeseries_error

When the pipeline next runs. Check the pipeline logs.

atton16 commented 4 years ago

Found it. It looks like the pipeline needs start_ts and end_ts field in the data. Luckily, manual/confirm_survey has them to reference to the trip but manual/user_profile_survey do not have them. Should it implement the fake start_ts and end_ts?

Pipeline log

Got error 'AttrDict' instance has no attribute 'start_ts' while saving entry AttrDict({'_id': ObjectId('5e0c91ecd7802ee78b6b2b22'), 'metadata': {'time_zone': 'Asia/Bangkok', 'plugin': 'none', 'write_ts': 1577867069.5477018, 'platform': 'ios', 'read_ts': 0, 'key': 'manual/user_profile_survey', 'type': 'message'}, 'user_id': UUID('4b38af07-4861-4d83-81fc-85baf68eefdb'), 'data': {'survey_result': '<data xmlns:jr="http://openrosa.org/javarosa" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" id="snapshot_xml">\n          <start>2020-01-01T15:23:19.311+07:00</start>\n          <end>2020-01-01T15:23:19.314+07:00</end>\n          <today>2020-01-01</today>\n          <deviceid>deviceid not found</deviceid>\n          <group_uy6od86>\n            <age>25_33</age>\n            <gender>male</gender>\n            <home_location_001>0</home_location_001>\n            <employment>option_1</employment>\n            <work_location>0</work_location>\n            <study>study_full_time</study>\n            <school_location>0</school_location>\n          </group_uy6od86>\n          <group_my6jo52>\n            <hh_size>0</hh_size>\n            <num_workers>0</num_workers>\n            <num_adults>0</num_adults>\n            <num_kids>0</num_kids>\n          </group_my6jo52>\n          <group_lm5fq00>\n            <num_vehicles>0</num_vehicles>\n            <driver_licence>no</driver_licence>\n            <drive/>\n            <Vehicle_year/>\n            <veh_make/>\n            <veh_model/>\n          </group_lm5fq00>\n          <meta>\n            <instanceID>uuid:77f51391-8af7-47d2-8ded-d822a542ffdd</instanceID>\n          </meta>\n        </data>', 'user_uuid': '4b38af0748614d8381fc85baf68eefdb', 'timestamp': '2020-01-01T08:24:29.540Z'}}) -> None
atton16 commented 4 years ago

There is also another error. It appears that xml stored in data has some unicode characters that the system does not happy to accept it.

Pipeline log

--- Logging error ---
Traceback (most recent call last):
  File "/usr/src/app/emission/net/usercache/builtin_usercache_handler.py", line 79, in moveToLongTerm
    unified_entry = enuf.convert_to_common_format(entry)
  File "/usr/src/app/emission/net/usercache/formatters/formatter.py", line 13, in convert_to_common_format
    return format_fn(entry)
  File "/usr/src/app/emission/net/usercache/formatters/ios/user_profile_survey.py", line 12, in format
    return fgl.format(entry)
  File "/usr/src/app/emission/net/usercache/formatters/generic/userlabel.py", line 28, in format
    fc.expand_start_end_data_times(data, metadata)
  File "/usr/src/app/emission/net/usercache/formatters/common.py", line 31, in expand_start_end_data_times
    d.start_local_dt = ecsdlq.get_local_date(d.start_ts, m.time_zone)
  File "/opt/conda/envs/emission/lib/python3.6/site-packages/attrdict/mixins.py", line 82, in __getattr__
    cls=self.__class__.__name__, name=key
AttributeError: 'AttrDict' instance has no attribute 'start_ts'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/emission/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 849-851: ordinal not in range(128)
Call stack:
  File "<string>", line 1, in <module>
  File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/spawn.py", line 118, in _main
    return self._bootstrap()
  File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/conda/envs/emission/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/src/app/emission/pipeline/intake_stage.py", line 72, in run_intake_pipeline
    run_intake_pipeline_for_user(uuid)
  File "/usr/src/app/emission/pipeline/intake_stage.py", line 84, in run_intake_pipeline_for_user
    uh.moveToLongTerm()
  File "/usr/src/app/emission/net/usercache/builtin_usercache_handler.py", line 87, in moveToLongTerm
    ts.insert_error(entry_doc)
  File "/usr/src/app/emission/storage/timeseries/builtin_timeseries.py", line 375, in insert_error
    logging.debug("Inserting entry %s into error timeseries" % entry)
Message: 'Inserting entry {\'_id\': ObjectId(\'5e0c4a98d7802ee78b6ad8a1\'), \'metadata\': {\'time_zone\': \'Australia/Sydney\', \'plugin\': \'none\', \'write_ts\': 1577860330.459893, \'platform\': \'ios\', \'read_ts\': 0, \'key\': \'manual/user_profile_survey\', \'type\': \'message\'}, \'user_id\': UUID(\'74fc6ede-ece0-43b5-ade3-456ebe034e3f\'), \'data\': {\'survey_result\': \'<data xmlns:jr="http://openrosa.org/javarosa" xmlns:odk="http://www.opendatakit.org/xforms" xmlns:orx="http://openrosa.org/xforms" id="snapshot_xml">\\n          <start>2020-01-01T17:31:32.724+11:00</start>\\n          <end>2020-01-01T17:31:32.727+11:00</end>\\n          <today>2020-01-01</today>\\n          <deviceid>deviceid not found</deviceid>\\n          <group_uy6od86>\\n            <age>25_33</age>\\n            <gender>female</gender>\\n            <home_location_001>\u0e04\u0e48\u0e32</home_location_001>\\n            <employment>option_2</employment>\\n            <work_location>\u0e31\u0e19\u0e48</work_location>\\n            <study>study_part_time</study>\\n            <school_location>\u0e38\u0e36\u0e35</school_location>\\n          </group_uy6od86>\\n          <group_my6jo52>\\n            <hh_size>6</hh_size>\\n            <num_workers>1</num_workers>\\n            <num_adults>2</num_adults>\\n            <num_kids>4</num_kids>\\n          </group_my6jo52>\\n          <group_lm5fq00>\\n            <num_vehicles>3</num_vehicles>\\n            <driver_licence>no</driver_licence>\\n            <drive/>\\n            <Vehicle_year/>\\n            <veh_make/>\\n            <veh_model/>\\n          </group_lm5fq00>\\n          <meta>\\n            <instanceID>uuid:2fc806f0-0467-44f9-9299-45c02c477962</instanceID>\\n          </meta>\\n        </data>\', \'user_uuid\': \'74fc6edeece043b5ade3456ebe034e3f\', \'timestamp\': \'2020-01-01T06:32:10.451Z\'}} into error timeseries'
shankari commented 4 years ago

Found it. It looks like the pipeline needs start_ts and end_ts field in the data. Luckily, manual/confirm_survey has them to reference to the trip but manual/user_profile_survey do not have them. Should it implement the fake start_ts and end_ts?

The pipeline doesn't need anything; the specific formatter that you are using emission.net.usercache.formatters.generic.userlabel does. Just implement the format method directly without expecting start_ts and end_ts instead of calling fgl.format(entry)

shankari commented 4 years ago

wrt the second error, you can set the logger to log in UTF-8 instead of ASCII. There is a prior issue filed with a suggested fix.

atton16 commented 4 years ago

Found it. It looks like the pipeline needs start_ts and end_ts field in the data. Luckily, manual/confirm_survey has them to reference to the trip but manual/user_profile_survey do not have them. Should it implement the fake start_ts and end_ts?

The pipeline doesn't need anything; the specific formatter that you are using emission.net.usercache.formatters.generic.userlabel does. Just implement the format method directly without expecting start_ts and end_ts instead of calling fgl.format(entry)

I see, thank you.

shankari commented 4 years ago

Yeah I wrote emission.net.usercache.formatters.generic.userlabel because most of the userlabels had the same format, and I didn't want to copy/paste the same formatting code over and over. But if the userlabel actually has a different format, implementing format separately is fine.

shankari commented 4 years ago

FYI, the ASCII logging issue is https://github.com/e-mission/e-mission-docs/issues/444