e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Journey on how to fix the pipeline ⛏️ #689

Closed larbizard closed 2 years ago

larbizard commented 2 years ago

Steps to reproduce: Execute analysis pipeline using /e-mission-py.bash bin/debug/intake_single_user.py -e "myuseremail@email.com"

Expected result: The pipeline executres corretrly and the data is analyzed and stored in the Staging_analysis_timeseries collection.

Actuel result: (emission) root@e-mission-server-fabmob-qc:~/e-mission-server# ./e-mission-py.bash bin/debug/intake_single_user.py -e "myuseremail@email.com" Connecting to database URL db google maps key not configured, falling back to nominatim nominatim not configured either, place decoding must happen on the client transit stops query not configured, falling back to default expectations.conf.json not configured, falling back to sample, default configuration ERROR:root:habitica not configured, game functions not supported Traceback (most recent call last): File "/root/e-mission-server/emission/net/ext_service/habitica/proxy.py", line 22, in key_file = open('conf/net/ext_service/habitica.json') FileNotFoundError: [Errno 2] No such file or directory: 'conf/net/ext_service/habitica.json' 2021-12-07T11:19:40.559267+00:00**UUID f5e92e19-39fa-4598-ad66-ebfc47cb9e34: moving to long term** 2021-12-07T11:19:40.600720+00:00**UUID f5e92e19-39fa-4598-ad66-ebfc47cb9e34: updating incoming user inputs** 2021-12-07T11:19:40.886939+00:00**UUID f5e92e19-39fa-4598-ad66-ebfc47cb9e34: filter accuracy if needed** 2021-12-07T11:19:41.452110+00:00**UUID f5e92e19-39fa-4598-ad66-ebfc47cb9e34: segmenting into trips** Found error curr_state.curr_run_ts = 1635335308.6691191 while processing pipeline for user f5e92e19-39fa-4598-ad66-ebfc47cb9e34, skipping Traceback (most recent call last): File "/root/e-mission-server/emission/pipeline/intake_stage.py", line 73, in run_intake_pipeline run_intake_pipeline_for_user(uuid) File "/root/e-mission-server/emission/pipeline/intake_stage.py", line 122, in run_intake_pipeline_for_user eaist.segment_current_trips(uuid) File "/root/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation.py", line 52, in segment_current_trips time_query = epq.get_time_range_for_segmentation(user_id) File "/root/e-mission-server/emission/storage/pipeline_queries.py", line 55, in get_time_range_for_segmentation return get_time_range_for_stage(user_id, ps.PipelineStages.TRIP_SEGMENTATION) File "/root/e-mission-server/emission/storage/pipeline_queries.py", line 308, in get_time_range_for_stage assert curr_state.curr_run_ts is None, "curr_state.curr_run_ts = %s" % curr_state.curr_run_ts AssertionError: curr_state.curr_run_ts = 1635335308.6691191

shankari commented 2 years ago

@larbizard this is a dup of https://github.com/e-mission/e-mission-docs/issues/473

As you can see from https://github.com/e-mission/e-mission-docs/blob/6a0c1ec9ddc31036d64dfc087458bfa46b94cecf/docs/dev/archi/pipeline_details.md#curr_run_ts---while-processing-pipeline the curr_run_ts essentially acts as a lock for a particular (user, stage) combination. If two processes try to operate on the same stage for the same user, the process that enters the stage first grabs the lock. The process that tries to operate second cannot grab the lock and skips the stage.

This means that you either:

You can fix it by resetting the pipeline (bin/reset_pipeline.py) for the user and then relaunching it exactly once.

larbizard commented 2 years ago

Since last may 15 the pipeline started crashing again. Was the pipeline analysing data for your account ?

Neither me nor Raouf have analysed data. We only have Draft icon on our phone since then. Same case for many users.

Also we noticed in the logs that there are pipeline errors for some users. Theses users have probably not used the app since our last update, they maybe even deleted the app. Shall we purge them ?

We are thinking of executing the pipeline only for specific users (Known active users who have installed the new version after april 29) using bin/debug/intake_single_user.py

We tryed to execute the pipeline for Raouf and it passed but nothing was updated. Then we reset the pipeline for Raouf after 2022-05-16 and executed it again then it failed with the following logs:

(emission) root@e-mission-server-fabmob-qc:~/e-mission-server# ./e-mission-py.bash bin/debug/intake_single_user.py -u cbeb55e5969943a6911bb44b283b45d1 Connecting to database URL db google maps key not configured, falling back to nominatim nominatim not configured either, place decoding must happen on the client transit stops query not configured, falling back to default expectations.conf.json not configured, falling back to sample, default configuration ERROR:root:habitica not configured, game functions not supported Traceback (most recent call last): File "/root/e-mission-server/emission/net/ext_service/habitica/proxy.py", line 22, in key_file = open('conf/net/ext_service/habitica.json') FileNotFoundError: [Errno 2] No such file or directory: 'conf/net/ext_service/habitica.json' 2022-05-19T14:48:17.500481+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: moving to long term** 2022-05-19T14:48:17.569037+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: updating incoming user inputs** 2022-05-19T14:48:17.846348+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: filter accuracy if needed** 2022-05-19T14:48:19.379204+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: segmenting into trips** 2022-05-19T14:48:19.860724+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: segmenting into sections** 2022-05-19T14:48:21.303548+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: smoothing sections** 2022-05-19T14:48:22.476714+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: cleaning and resampling timeline** 2022-05-19T14:48:22.557741+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: inferring transportation mode** 2022-05-19T14:48:34.239456+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: inferring labels** 2022-05-19T14:48:34.300314+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: populating expectations** 2022-05-19T14:48:34.357588+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: creating confirmed objects ** 2022-05-19T14:48:34.500369+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: storing views to cache** Found error Found len(ret_list) = 2, expected <=1 while processing trip Entry({'_id': ObjectId('628195a21f61a1a85cc6bb13'), 'user_id': UUID('cbeb55e5-9699-43a6-911b-b44b283b45d1'), 'metadata': {'key': 'analysis/cleaned_trip', 'platform': 'server', 'write_ts': 1652659618.866277, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2022, 'month': 5, 'day': 15, 'hour': 17, 'minute': 6, 'second': 58, 'weekday': 6, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2022-05-15T17:06:58.866277-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'end_ts': 1652655756.719, 'end_local_dt': {'year': 2022, 'month': 5, 'day': 15, 'hour': 19, 'minute': 2, 'second': 36, 'weekday': 6, 'timezone': 'America/Toronto'}, 'end_fmt_time': '2022-05-15T19:02:36.719000-04:00', 'end_loc': {'type': 'Point', 'coordinates': [-73.5520783, 45.5460571]}, 'raw_trip': ObjectId('628195a01f61a1a85cc6bb0c'), 'start_ts': 1652655482.3305671, 'start_local_dt': {'year': 2022, 'month': 5, 'day': 15, 'hour': 18, 'minute': 58, 'second': 2, 'weekday': 6, 'timezone': 'America/Toronto'}, 'start_fmt_time': '2022-05-15T18:58:02.330567-04:00', 'start_loc': {'type': 'Point', 'coordinates': [-73.543472, 45.5519273]}, 'duration': 274.38843297958374, 'distance': 1592.1785739074278, 'start_place': ObjectId('6281883e6a1b862dcba6934d'), 'end_place': ObjectId('628195a31f61a1a85cc6bb21')}}) Traceback (most recent call last): File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 288, in get_geojson_for_timeline trip_geojson = trip_to_geojson(trip, tl) File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 247, in trip_to_geojson section_gj = section_to_geojson(section, tl) File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 156, in section_to_geojson ise = esds.cleaned2inferred_section(section.user_id, section.get_id()) File "/root/e-mission-server/emission/storage/decorations/section_queries.py", line 45, in cleaned2inferred_section curr_predicted_entry = _get_inference_entry_for_section(user_id, section_id, "analysis/inferred_section", "data.cleaned_section") File "/root/e-mission-server/emission/storage/decorations/section_queries.py", line 56, in _get_inference_entry_for_section assert len(ret_list) <= 1, "Found len(ret_list) = %d, expected <=1" % len(ret_list) AssertionError: Found len(ret_list) = 2, expected <=1 Storing views to cache failed for user cbeb55e5-9699-43a6-911b-b44b283b45d1 Traceback (most recent call last): File "/root/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 109, in storeViewsToCache self.storeTimelineToCache(time_query) File "/root/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 180, in storeTimelineToCache trip_gj_list = self.get_trip_list_for_seven_days(start_ts) File "/root/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 269, in get_trip_list_for_seven_days trip_gj_list = gfc.get_geojson_for_ts(self.user_id, seventy_two_hours_ago_ts, start_ts) File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 272, in get_geojson_for_ts return get_geojson_for_timeline(user_id, tl) File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 293, in get_geojson_for_timeline raise e File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 288, in get_geojson_for_timeline trip_geojson = trip_to_geojson(trip, tl) File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 247, in trip_to_geojson section_gj = section_to_geojson(section, tl) File "/root/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 156, in section_to_geojson ise = esds.cleaned2inferred_section(section.user_id, section.get_id()) File "/root/e-mission-server/emission/storage/decorations/section_queries.py", line 45, in cleaned2inferred_section curr_predicted_entry = _get_inference_entry_for_section(user_id, section_id, "analysis/inferred_section", "data.cleaned_section") File "/root/e-mission-server/emission/storage/decorations/section_queries.py", line 56, in _get_inference_entry_for_section assert len(ret_list) <= 1, "Found len(ret_list) = %d, expected <=1" % len(ret_list) AssertionError: Found len(ret_list) = 2, expected <=1

We reste again and we had the following error:

(emission) root@e-mission-server-fabmob-qc:~/e-mission-server# ./e-mission-py.bash bin/debug/intake_single_user.py -u cbeb55e5969943a6911bb44b283b45d1 Connecting to database URL db google maps key not configured, falling back to nominatim nominatim not configured either, place decoding must happen on the client transit stops query not configured, falling back to default expectations.conf.json not configured, falling back to sample, default configuration ERROR:root:habitica not configured, game functions not supported Traceback (most recent call last): File "/root/e-mission-server/emission/net/ext_service/habitica/proxy.py", line 22, in key_file = open('conf/net/ext_service/habitica.json') FileNotFoundError: [Errno 2] No such file or directory: 'conf/net/ext_service/habitica.json' 2022-05-19T15:33:32.390398+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: moving to long term** 2022-05-19T15:33:32.440895+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: updating incoming user inputs** 2022-05-19T15:33:32.688896+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: filter accuracy if needed** 2022-05-19T15:33:33.336430+00:00**UUID cbeb55e5-9699-43a6-911b-b44b283b45d1: segmenting into trips** Found error curr_state.curr_run_ts = 1652973727.5321794 while processing pipeline for user cbeb55e5-9699-43a6-911b-b44b283b45d1, skipping Traceback (most recent call last): File "/root/e-mission-server/emission/pipeline/intake_stage.py", line 73, in run_intake_pipeline run_intake_pipeline_for_user(uuid) File "/root/e-mission-server/emission/pipeline/intake_stage.py", line 122, in run_intake_pipeline_for_user eaist.segment_current_trips(uuid) File "/root/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation.py", line 52, in segment_current_trips time_query = epq.get_time_range_for_segmentation(user_id) File "/root/e-mission-server/emission/storage/pipeline_queries.py", line 55, in get_time_range_for_segmentation return get_time_range_for_stage(user_id, ps.PipelineStages.TRIP_SEGMENTATION) File "/root/e-mission-server/emission/storage/pipeline_queries.py", line 308, in get_time_range_for_stage assert curr_state.curr_run_ts is None, "curr_state.curr_run_ts = %s" % curr_state.curr_run_ts AssertionError: curr_state.curr_run_ts = 1652973727.5321794

We tryed with my account also reset and execute the pipeline there where no errors but nothing changed I still have Draft trips.

shankari commented 2 years ago

@larbizard

first,

 assert curr_state.curr_run_ts is None, "curr_state.curr_run_ts = %s" % curr_state.curr_run_ts
AssertionError: curr_state.curr_run_ts = 1652973727.5321794

means that either your reset failed, or you are running two copies of the pipeline at the same time.

second,

 AssertionError: Found len(ret_list) = 2, expected <=1

is also similar, we expect there to be one inference per section and we found two.

We tryed with my account also reset and execute the pipeline there where no errors but nothing changed I still have Draft trips.

wrt this can you send me the logs for executing your pipeline? If you executed the pipeline using bin/debug/intake_single_user.py, the logs should be in /var/tmp/intake_single.log

lgharib commented 2 years ago

Hi @shankari,

I did a reset pipeline using:

./e-mission-py.bash bin/reset_pipeline.py -u aee1cab98e294ddda67726dea5abd8a3 -d 2022-07-10

And it seemed to work correctly no errors.

Then I am using the following script to start the pipeline for my user:

./e-mission-py.bash bin/debug/intake_single_user.py

Unfortunatly at the last step, :**UUID aee1cab9-8e29-4ddd-a677-26dea5abd8a3: storing views to cache** we have an error.

Since the log file is too big I past it in this Google Doc file :+1: https://docs.google.com/document/d/18T3tOGTZjysD6fzIVL1p4v1eS8eSunVCDpz59CEC_WQ/edit?usp=sharing

Thank you for your help,

Larbi

shankari commented 2 years ago

@lgharib

tl;dr: You are running reset_pipeline.py incorrectly. -d 2022-07-10 indicates that the pipeline should be reset to 2022-07-10 aka day before yesterday, which won't actually delete any of the prior analysis results (aka duplicate entries from 2022-05-15)

$ ./e-mission-py.bash bin/reset_pipeline.py --help
storage not configured, falling back to sample, default configuration
URL not formatted, defaulting to "Stage_database"
Connecting to database URL localhost
usage: reset_pipeline.py [-h]
                         (-a | -p {android,ios} | -u USER_LIST [USER_LIST ...] | -e EMAIL_LIST [EMAIL_LIST ...])
                         [-d DATE] [-n]

optional arguments:
  -h, --help            show this help message and exit
  -a, --all             reset the pipeline for all users
  -p {android,ios}, --platform {android,ios}
                        reset the pipeline for all on the specified platform
  -u USER_LIST [USER_LIST ...], --user_list USER_LIST [USER_LIST ...]
                        user ids to reset the pipeline for
  -e EMAIL_LIST [EMAIL_LIST ...], --email_list EMAIL_LIST [EMAIL_LIST ...]
                        email addresses to reset the pipeline for
  -d DATE, --date DATE  date to reset the pipeline to. Format 'YYYY-mm-dd'
                        e.g. 2016-02-17. Interpreted in UTC, so 2016-02-17
                        will reset the pipeline to 2016-02-16T16:00:00-08:00
                        in the pacific time zone
  -n, --dry_run         do everything except actually perform the operations

details

The backtrace is

2022-07-12 08:45:49,959:ERROR:139621497538368:Storing views to cache failed for user aee1cab9-8e29-4ddd-a677-26dea5abd8a3
Traceback (most recent call last):
  File "/root/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 109, in storeViewsToCache
    self.storeTimelineToCache(time_query)
...
  File "/root/e-mission-server/emission/storage/decorations/section_queries.py", line 45, in cleaned2inferred_section
    curr_predicted_entry = _get_inference_entry_for_section(user_id, section_id, "analysis/inferred_section", "data.cleaned_section")
  File "/root/e-mission-server/emission/storage/decorations/section_queries.py", line 56, in _get_inference_entry_for_section
    assert len(ret_list) <= 1, "Found len(ret_list) = %d, expected <=1" % len(ret_list)
AssertionError: Found len(ret_list) = 2, expected <=1

So it looks like there are two inferred sections for the cleaned section.

The erroneous trip appears to be

2022-07-12 08:45:49,955:ERROR:139621497538368:Found error Found len(ret_list) = 2, expected <=1 while processing trip Entry({'_id': ObjectId('62ccafb3e86c73263a959dde'), 'user_id': UUID('...), 'metadata': {'key': 'analysis/cleaned_trip', 'platform': 'server', 'write_ts': 1657581491.3980167, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {...}, 'data': {'source': 'DwellSegmentationTimeFilter', 'end_ts': 1652621530.061, 'end_local_dt': {...}, 'end_fmt_time': '2022-05-15...', 'end_loc': {'type': 'Point', 'coordinates': [...]}, 'raw_trip': ObjectId('62ccaf6be86c73263a958825'), 'start_ts': ..., 'start_local_dt': {...'timezone': 'Africa/Casablanca'}, 'start_fmt_time': '2022-05-15T...', 'start_loc': {'type': 'Point', 'coordinates': [...]}, 'duration': 791.6627106666565, 'distance': 154.4805688772839, 'start_place': ObjectId('62ccafc8e86c73263a959e53'), 'end_place': ObjectId('62ccafc8e86c73263a959e54')}})

And the related section seems to be:

2022-07-12 08:45:48,069:DEBUG:139621497538368:About to execute query {'user_id': UUID('aee1cab9-8e29-4ddd-a677-26dea5abd8a3'), 'data.trip_id': ObjectId('62ccafb3e86c73263a959dde'), 'metadata.key': 'analysis/cleaned_section'} with sort_key data.start_ts
2022-07-12 08:45:49,830:DEBUG:139621497538368:keys = (analysis/cleaned_stop, analysis/cleaned_section), len(places) = 0, len(trips) = 1
2022-07-12 08:45:49,830:DEBUG:139621497538368:places_or_stops = []
2022-07-12 08:45:49,830:DEBUG:139621497538368:trips_or_sections = [ObjectId('62ccafb3e86c73263a959de0')]

Looking at the MODE_INFERENCE, the step has already been run so we don't re-run it.

2022-07-12 08:43:51,551:INFO:139621497538368:**********UUID aee1cab9-8e29-4ddd-a677-26dea5abd8a3: inferring transportation mode**********
2022-07-12 08:43:51,554:INFO:139621497538368:For stage PipelineStages.MODE_INFERENCE, start_ts = 2022-05-15T13:32:15.061000

It doesn't look like reset_pipeline worked properly because almost all the pipeline states had a timestamp until which they were complete.

lgharib commented 2 years ago

@shankari How shall I use the rest_pipeline to fix the issue ?

Without the date ?

./e-mission-py.bash bin/reset_pipeline.py -u aee1cab98e294ddda67726dea5abd8a3

lgharib commented 2 years ago

Also I saw that you noticed duplicates "aka duplicate entries from 2022-05-15".

I have backed up twice the machine I am working using a snapshot from a week before.

Is it possible that I created duplicates ?

Does the app sync all the missing data or just partial data ?

shankari commented 2 years ago

Also I saw that you noticed duplicates "aka duplicate entries from 2022-05-15". I have backed up twice the machine I am working using a snapshot from a week before. Is it possible that I created duplicates ?

Seems unlikely given that we don't see duplicates anywhere else.

Does the app sync all the missing data or just partial data ?

The duplicates are not in app sensed data, but in analysis results. The app syncs all new data since the last sync.

shankari commented 2 years ago

As an aside, I note that all the pipeline timestamps are for May 14th. Did you stop collecting data then, or restore a backup from May or ??? Just trying to see if there is a reason why you don't have more recent data.

/tmp/pipeline_issue.log:2022-07-12 08:43:48,024:INFO:139621497538368:For stage PipelineStages.USERCACHE, start_ts = 2022-05-14T20:49:14.735000
/tmp/pipeline_issue.log:2022-07-12 08:43:48,057:INFO:139621497538368:For stage PipelineStages.USER_INPUT_MATCH_INCOMING, start_ts = 2022-05-11T13:23:56.202000
/tmp/pipeline_issue.log:2022-07-12 08:43:50,030:INFO:139621497538368:For stage PipelineStages.ACCURACY_FILTERING, start_ts = 2022-04-30T23:42:50.784000
/tmp/pipeline_issue.log:2022-07-12 08:43:50,050:INFO:139621497538368:For stage PipelineStages.TRIP_SEGMENTATION, start_ts = 2022-05-15T13:36:55.645000
/tmp/pipeline_issue.log:2022-07-12 08:43:50,481:INFO:139621497538368:For stage PipelineStages.SECTION_SEGMENTATION, start_ts = 2022-05-15T13:32:15.061000
/tmp/pipeline_issue.log:2022-07-12 08:43:50,646:INFO:139621497538368:For stage PipelineStages.JUMP_SMOOTHING, start_ts = 2022-05-15T13:32:15.061000
/tmp/pipeline_issue.log:2022-07-12 08:43:50,734:INFO:139621497538368:For stage PipelineStages.CLEAN_RESAMPLING, start_ts = 2022-05-15T13:32:10.061000
/tmp/pipeline_issue.log:2022-07-12 08:43:51,554:INFO:139621497538368:For stage PipelineStages.MODE_INFERENCE, start_ts = 2022-05-15T13:32:15.061000
/tmp/pipeline_issue.log:2022-07-12 08:43:51,651:INFO:139621497538368:For stage PipelineStages.LABEL_INFERENCE, start_ts = 2022-05-15T13:32:15.061000
/tmp/pipeline_issue.log:2022-07-12 08:43:51,792:INFO:139621497538368:For stage PipelineStages.EXPECTATION_POPULATION, start_ts = 2022-05-15T13:32:15.061000
/tmp/pipeline_issue.log:2022-07-12 08:43:51,884:INFO:139621497538368:For stage PipelineStages.CREATE_CONFIRMED_OBJECTS, start_ts = 2022-05-15T13:32:15.061000
/tmp/pipeline_issue.log:2022-07-12 08:43:51,970:INFO:139621497538368:For stage PipelineStages.OUTPUT_GEN, start_ts is None
larbizard commented 2 years ago

We did not stop collecting data.

However after May 14th we notice that the data stopped beeing analysed for Raouf and me.

At that time we tried to reset restpectivly my pipeline and his an also restart the intake single user analysis unsuccessfully.

I restored a backup a month ago on June the 15th (Backups are done on mondays at 10 pm so probably a backup from the monday before june the 13th)

Yesterday After doing updates to mongodv and pulled the last version of https://github.com/e-mission/e-mission-server/tree/gis-based-mode-detection the server stoped working so I did another restore from a backup from a week ago.

Is there a way to reset all analysis and restart them for all users.

Also when I start a pipeline usin the intake single user pipeline I takes avout 30 minutes to 40 minutes to executes and meanwhile a scheduled bin/intake_multiprocess.py 3 is executed. That maybe what causes the first pipeline to crash.

shankari commented 2 years ago

At that time we tried to reset restpectivly my pipeline and his an also restart the intake single user analysis unsuccessfully.

I wonder if this is related to the incorrect call for reset_pipeline.

Also when I start a pipeline usin the intake single user pipeline I takes avout 30 minutes to 40 minutes to executes and meanwhile a scheduled bin/intake_multiprocess.py 3 is executed. That maybe what causes the first pipeline to crash.

Ah yes, you do not want to execute two pipelines in parallel. I would suggest pulling your data to your laptop and running the pipeline there to ensure that it works and debug the issues if any.

Then, on production, just reset the pipeline and don't run the single_intake.py. When intake_multiprocess runs, it will run the pipeline on your data as well.

Is there a way to reset all analysis and restart them for all users.

As you can see, reset all analysis is the -a option instead of -u for a single user. The intake_multiprocess will re-run the pipeline for all users and catch up after the reset. Note that this may take a few hours, so you want to disable the cronjob that runs the intake pipeline until the "catch up after reset" manually scheduled job is done.

larbizard commented 2 years ago

I turned off the crontab that executes the multiprocess. I also executed the purge analyse script to remove any duplicates. I checked the data on my phone before May the 14 th, no data was analysed all draft. I executed the intake single process ( 3 h of waiting for my data ) The data analysed before the 14 th was analysed correctly but after that date still draft. Wither the data is not synchronized or the data is corrupted

shankari commented 2 years ago
import edb.core.get_database as edb
from uuid import UUID
# find most 3 recent locations in the usercache 
list(edb.get_usercache_db().find({"user_id": UUID("<your uuid>"), "metadata.key": "background/location"}).sort("metadata.write_ts":, -1).limit(3))
# find most 3 recent locations in the timeseries 
list(edb.get_timeseries_db().find({"user_id": UUID("<your uuid>"), "metadata.key": "background/location"}).sort("metadata.write_ts":, -1).limit(3))
lgharib commented 2 years ago

I used:

e-mission-py.bash bin/purge_analysis_database.py

According to the code anaylisis, common places common trips and pipeline states are removed : def purgeAnalysisData(): print(edb.get_analysis timeseries_db().remove()), print(edb.get_common_place_db().remove()) print(edb.get_common_trip_db().remove()) print(edb.get_pipeline_state_db().remove())

Then I executed :

e-mission-py.bash bin/debug/intake_single_user.py -u aeelcab98e294ddda67726dea5abd8a3

I didn't output the logs in a file. I will execute the script again and send them.

I searched for the last synched data on the server for my user and the last data is from may 15 th however other users had their data synched correctly yeasterday.

My data:

db.Stage_timeseries.find({"user_id" : BinData(3,"ruHKuY4pTd2mdybepavYow=="), "metadata.key": "background/location"}).sort({"metadata.write_ts": -1}).limit(3) { "_id" : ObjectId("628101f399243fc036d9096f"), "user_id" : BinData(3,"ruHKuY4pTd2mdybepavYow=="), "metadata" : { "key" : "background/location", "platform" : "android", "read_ts" : 0, "time_zone" : "Africa/Casablanca", "type" : "sensor-data", "write_ts" : 1652621810.623, "write_local_dt" : { "year" : 2022, "month" : 5, "day" : 15, "hour" : 14, "minute" : 36, "second" : 50, "weekday" : 6, "timezone" : "Africa/Casablanca" }, "write_fmt_time" : "2022-05-15T14:36:50.623000+01:00" }, "data" : { "accuracy" : 16.566, "altitude" : 459.29998779296875, "elapsedRealtimeNanos" : NumberLong("168191300295515"), "filter" : "time", "fmt_time" : "2022-05-15T14:36:50+01:00", "latitude" : 34.0326923, "longitude" : -5.0061007, "sensed_speed" : 0, "ts" : 1652621810, "local_dt" : { "year" : 2022, "month" : 5, "day" : 15, "hour" : 14, "minute" : 36, "second" : 50, "weekday" : 6, "timezone" : "Africa/Casablanca" }, "loc" : { "type" : "Point", "coordinates" : [ -5.0061007, 34.0326923 ] }, "heading" : 0 } } { "_id" : ObjectId("628101f399243fc036d90969"), "user_id" : BinData(3,"ruHKuY4pTd2mdybepavYow=="), "metadata" : { "key" : "background/location", "platform" : "android", "read_ts" : 0, "time_zone" : "Africa/Casablanca", "type" : "sensor-data", "write_ts" : 1652621784.297, "write_local_dt" : { "year" : 2022, "month" : 5, "day" : 15, "hour" : 14, "minute" : 36, "second" : 24, "weekday" : 6, "timezone" : "Africa/Casablanca" }, "write_fmt_time" : "2022-05-15T14:36:24.297000+01:00" }, "data" : { "accuracy" : 8.273, "altitude" : 459.29998779296875, "elapsedRealtimeNanos" : NumberLong("168164949000000"), "filter" : "time", "fmt_time" : "2022-05-15T14:36:24.190000+01:00", "latitude" : 34.0326917, "longitude" : -5.0061008, "sensed_speed" : 0, "ts" : 1652621784.19, "local_dt" : { "year" : 2022, "month" : 5, "day" : 15, "hour" : 14, "minute" : 36, "second" : 24, "weekday" : 6, "timezone" : "Africa/Casablanca" }, "loc" : { "type" : "Point", "coordinates" : [ -5.0061008, 34.0326917 ] }, "heading" : 90 } } { "_id" : ObjectId("628101f399243fc036d90965"), "user_id" : BinData(3,"ruHKuY4pTd2mdybepavYow=="), "metadata" : { "key" : "background/location", "platform" : "android", "read_ts" : 0, "time_zone" : "Africa/Casablanca", "type" : "sensor-data", "write_ts" : 1652621780.586, "write_local_dt" : { "year" : 2022, "month" : 5, "day" : 15, "hour" : 14, "minute" : 36, "second" : 20, "weekday" : 6, "timezone" : "Africa/Casablanca" }, "write_fmt_time" : "2022-05-15T14:36:20.586000+01:00" }, "data" : { "accuracy" : 8.273, "altitude" : 459, "elapsedRealtimeNanos" : NumberLong("168161295804589"), "filter" : "time", "fmt_time" : "2022-05-15T14:36:20+01:00", "latitude" : 34.0326917, "longitude" : -5.0061008, "sensed_speed" : 0, "ts" : 1652621780, "local_dt" : { "year" : 2022, "month" : 5, "day" : 15, "hour" : 14, "minute" : 36, "second" : 20, "weekday" : 6, "timezone" : "Africa/Casablanca" }, "loc" : { "type" : "Point", "coordinates" : [ -5.0061008, 34.0326917 ] }, "heading" : 0 } }

All other users:

db.Stage_timeseries.find({"metadata.key": "background/location"}).sort({"metadata.write_ts": -1}).limit(3) { "_id" : ObjectId("62cfa5734ae51d1abf3cdcd9"), "user_id" : BinData(3,"+U0w/rUMTLmEZv0OCwPIGw=="), "metadata" : { "key" : "background/location", "platform" : "android", "read_ts" : 0, "time_zone" : "America/Toronto", "type" : "sensor-data", "write_ts" : 1657775475.165, "write_local_dt" : { "year" : 2022, "month" : 7, "day" : 14, "hour" : 1, "minute" : 11, "second" : 15, "weekday" : 3, "timezone" : "America/Toronto" }, "write_fmt_time" : "2022-07-14T01:11:15.165000-04:00" }, "data" : { "accuracy" : 43.952, "altitude" : 130.17681884765625, "elapsedRealtimeNanos" : NumberLong("367635481943485"), "filter" : "time", "fmt_time" : "2022-07-14T01:11:15-04:00", "latitude" : 45.3939464, "longitude" : -71.8895922, "sensed_speed" : 0.025381435, "ts" : 1657775475, "local_dt" : { "year" : 2022, "month" : 7, "day" : 14, "hour" : 1, "minute" : 11, "second" : 15, "weekday" : 3, "timezone" : "America/Toronto" }, "loc" : { "type" : "Point", "coordinates" : [ -71.8895922, 45.3939464 ] }, "heading" : 338.4458 } } { "_id" : ObjectId("62cfa5734ae51d1abf3cdcd7"), "user_id" : BinData(3,"+U0w/rUMTLmEZv0OCwPIGw=="), "metadata" : { "key" : "background/location", "platform" : "android", "read_ts" : 0, "time_zone" : "America/Toronto", "type" : "sensor-data", "write_ts" : 1657775466.762, "write_local_dt" : { "year" : 2022, "month" : 7, "day" : 14, "hour" : 1, "minute" : 11, "second" : 6, "weekday" : 3, "timezone" : "America/Toronto" }, "write_fmt_time" : "2022-07-14T01:11:06.762000-04:00" }, "data" : { "accuracy" : 362.166, "altitude" : 131.7021484375, "elapsedRealtimeNanos" : NumberLong("367626960000000"), "filter" : "time", "fmt_time" : "2022-07-14T01:11:06.478000-04:00", "latitude" : 45.3939408, "longitude" : -71.8895891, "sensed_speed" : 0, "ts" : 1657775466.478, "local_dt" : { "year" : 2022, "month" : 7, "day" : 14, "hour" : 1, "minute" : 11, "second" : 6, "weekday" : 3, "timezone" : "America/Toronto" }, "loc" : { "type" : "Point", "coordinates" : [ -71.8895891, 45.3939408 ] }, "heading" : 0 } } { "_id" : ObjectId("62cfa5734ae51d1abf3cdcd3"), "user_id" : BinData(3,"+U0w/rUMTLmEZv0OCwPIGw=="), "metadata" : { "key" : "background/location", "platform" : "android", "read_ts" : 0, "time_zone" : "America/Toronto", "type" : "sensor-data", "write_ts" : 1657775442.088, "write_local_dt" : { "year" : 2022, "month" : 7, "day" : 14, "hour" : 1, "minute" : 10, "second" : 42, "weekday" : 3, "timezone" : "America/Toronto" }, "write_fmt_time" : "2022-07-14T01:10:42.088000-04:00" }, "data" : { "accuracy" : 43.952, "altitude" : 131.7021484375, "elapsedRealtimeNanos" : NumberLong("367602481943433"), "filter" : "time", "fmt_time" : "2022-07-14T01:10:42-04:00", "latitude" : 45.3939414, "longitude" : -71.8895871, "sensed_speed" : 0.027612034, "ts" : 1657775442, "local_dt" : { "year" : 2022, "month" : 7, "day" : 14, "hour" : 1, "minute" : 10, "second" : 42, "weekday" : 3, "timezone" : "America/Toronto" }, "loc" : { "type" : "Point", "coordinates" : [ -71.8895871, 45.3939414 ] }, "heading" : 246.21338 } }

I had a look at my loggerDB file from my phone and filtered by "Sync" and I found the following error:

ServerSyncAdapter: 10 Error java.io.IOException while posting converted trips to JSON

I sent you by email a link to my loggerDB 1.4 gb.

shankari commented 2 years ago

I didn't output the logs in a file. I will execute the script again and send them.

You don't need to, the output is automatically logged to /var/tmp/intake_*

ServerSyncAdapter: 10 Error java.io.IOException while posting converted trips to JSON

That seems to explain why all your recent trips are "draft". They were never sent to the server.

shankari commented 2 years ago

@lgharib

This is pretty straightforward. You have not bumped up the message size on ngnix. Note that OpenPATH caches the data locally and pushes it up to the server periodically. If you have taken long trips, and/or are off the internet for a while, this data can pile up. We limit the amount of data sent to 10k entries, but even with that, we want to need to bump up the default ngnix (or other reverse proxy) message size. https://github.com/e-mission/e-mission-docs/issues/558 https://github.com/e-mission/e-mission-docs/pull/47/commits/1b012459b8fe5247982089e46a03dc7486712a68

From your logs:

22,1655286269.312,2022-06-15T02:44:29.312000-07:00,ServerSyncAdapter : Starting sync with push
28,1655286269.3960001,2022-06-15T02:44:29.396000-07:00,BuiltinUserCache : Result count = 10000
29,1655286269.3979998,2022-06-15T02:44:29.398000-07:00,BuiltinUserCache : Reading entry = 0 with key config/consent and write_ts 1.651136789082E9
30,1655286269.43,2022-06-15T02:44:29.430000-07:00,BuiltinUserCache : Reading entry = 500 with key background/filtered_location and write_ts 1.652622500639E9
...
47,1655286270.072,2022-06-15T02:44:30.072000-07:00,BuiltinUserCache : Reading entry = 9000 with key background/filtered_location and write_ts 1.652642667573E9
48,1655286270.1039999,2022-06-15T02:44:30.104000-07:00,BuiltinUserCache : Reading entry = 9500 with key background/location and write_ts 1.652643284619E9
49,1655286270.139,2022-06-15T02:44:30.139000-07:00,BuiltinUserCache : Returning array of length 10000
50,1655286270.1420002,2022-06-15T02:44:30.142000-07:00,"ConnectionSettings : in getConnectURL, connectionSettings = {""connectUrl"":""https:\/\/e-mission.fabmobqc.ca"",""aggregate_call_auth"":""no_auth"",""android"":{""auth"":{""method"":""prompted-auth"",""clientID"":""ignored""}},""ios"":{""auth"":{""method"":""prompted-auth"",""clientID"":""ignored""}}}"
51,1655286270.144,2022-06-15T02:44:30.144000-07:00,"ConnectionSettings : in getConnectURL, returning https://e-mission.fabmobqc.ca"
52,1655286271.3279998,2022-06-15T02:44:31.328000-07:00,CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@fc5a10c with statusHTTP/1.1 413 Request Entity Too Large
53,1655286271.336,2022-06-15T02:44:31.336000-07:00,ServerSyncAdapter : IO Error java.io.IOException while posting converted trips to JSON
lgharib commented 2 years ago

Hi @shankari,

Thank you for your help I was able to execute the pipeline after increasing the memory to 4GB :+1: .

Thank you,

Larbi


This issue can be closed :heavy_check_mark:


Journey on how to fix the pipeline :pick:

I was able to fix my issue of the pipeline not excuting. :heavy_check_mark:

The discussion has diverged a little bit from the title of the issue.

How to fix Assertion error in pipeline_queries.py AssertionError: curr_state.curr_run_ts = 1635335308.6691191

I fixed the assertion error by using reset pipeline script:

./e-mission-py.bash bin/reset_pipeline.py -u <uuid>

How to fix missing data on the server

The data was not synched between the user's phones and the server.

The main cause was the server Nginx configuration not allowing Large files.

After 2 months of not synchronizing due to another problem explained in the next section (Last trip end was at -1.0) the accumulated data to be synched was growing and reached arround 1 GB.

To fix that issue the solution was to increase the message size in the Nginx configuration file:

vim /etc/nginx/nginx.conf

http {
       ....
       client_max_body_size 10G;
       ....
}

Data not synched: root issue. The last trip didn't end corretly

BuiltinUserCache : Last trip end was at -1.0 BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"

If the data is not synched with the server check the loggerDB file in mobile app.

How to check the loggerDB file :memo:

You can send the loggerDB by email, otherwise if the file is too large, mine was 1,4 GB, you can try to send it and then right after the popup displays you cancel the operation then you will be able to navigate to the cached forlder in the Mobile app directory on your phone by plugging it to your computer using a cable and copy it to you computer.

You can then use this scrip to add the timestamp and convert the file to CSV and open it with Excel :+1:

bin/csv_export_add_date.py

If you find the Last trip end was at -1.0 log message that means that there was an issue in ending a trip.

Solution:

  1. You can either use the End trip and Force Sync under the Developer Zone.
  2. Or wait for the next trip to start and end. You can then Force the synchronization to push the data.

Failing pipeline due to lack of memory Out Of Memory (OOM) :1234:

./e-mission-py.bash: line 8: 3578 Killed PYTHONPATH=. python "$@"

If you execute the pipeline and receive this issue, that's propbably because you have an Out Of Memory (OOM) issue and the process was killed. You will need to increase you memory. I used 4GB to analyse 2 months worth of data.

shankari commented 2 years ago

Thanks @lgharib for the detailed explanation. Changed the subject and closing this issue now.