Closed shankari closed 1 year ago
Note also that during the original implementation of create_analysed_view
, we had the following:
# The datastreams API call filters by "metadata.write_ts"
# Unfortunately, this means that we can't use it to retrieve analysed results since the write_ts depends on when the pipeline was run
However, we now do support reading by data.start_ts
(as part of the label screen changes).
So we might be able to simplify this, but might also just run out of time.
Ok, so we currently dump top level keys and range level keys for the raw data
for phone_os, phone_map in pv.map().items():
for phone_label, phone_detail_map in phone_map.items():
for key in [k for k in phone_detail_map.keys() if "/" in k]:
print(f"Dumping top level key {key}")
and
for ranges in [phone_detail_map["evaluation_ranges"], phone_detail_map["calibration_ranges"]]:
for r in ranges:
for key in [k for k in r.keys() if "/" in k]:
print(f"Dumping key {key} for range with keys {r.keys()} and phone {phone_label}")
The top level keys are essentially only manual/evaluation_transition
$ grep "Dumping top level" /tmp/download.logs
Dumping top level key manual/evaluation_transition
Dumping top level key manual/evaluation_transition
Dumping top level key manual/evaluation_transition
Dumping top level key manual/evaluation_transition
Dumping top level key manual/evaluation_transition
Dumping top level key manual/evaluation_transition
Dumping top level key manual/evaluation_transition
Dumping top level key manual/evaluation_transition
The range level keys are
Dumping key background/location for range with keys dict_keys(['trip_id', 'trip_id_base', 'trip_run', 'start_ts', 'end_ts', 'duration', 'eval_common_trip_id', 'eval_role', 'eval_role_base', 'eval_role_run', 'evaluation_trip_ranges', 'background/battery', 'battery_df', 'background/location', 'background/filtered_location', 'location_df', 'filtered_location_df', 'background/motion_activity', 'motion_activity_df', 'statemachine/transition', 'transition_df']) and phone ucb-sdb-ios-4
Dumping key background/filtered_location for range with keys dict_keys(['trip_id', 'trip_id_base', 'trip_run', 'start_ts', 'end_ts', 'duration', 'eval_common_trip_id', 'eval_role', 'eval_role_base', 'eval_role_run', 'evaluation_trip_ranges', 'background/battery', 'battery_df', 'background/location', 'background/filtered_location', 'location_df', 'filtered_location_df', 'background/motion_activity', 'motion_activity_df', 'statemachine/transition', 'transition_df']) and phone ucb-sdb-ios-4
Dumping key background/motion_activity for range with keys dict_keys(['trip_id', 'trip_id_base', 'trip_run', 'start_ts', 'end_ts', 'duration', 'eval_common_trip_id', 'eval_role', 'eval_role_base', 'eval_role_run', 'evaluation_trip_ranges', 'background/battery', 'battery_df', 'background/location', 'background/filtered_location', 'location_df', 'filtered_location_df', 'background/motion_activity', 'motion_activity_df', 'statemachine/transition', 'transition_df']) and phone ucb-sdb-ios-4
Dumping key statemachine/transition for range with keys dict_keys(['trip_id', 'trip_id_base', 'trip_run', 'start_ts', 'end_ts', 'duration', 'eval_common_trip_id', 'eval_role', 'eval_role_base', 'eval_role_run', 'evaluation_trip_ranges', 'background/battery', 'battery_df', 'background/location', 'background/filtered_location', 'location_df', 'filtered_location_df', 'background/motion_activity', 'motion_activity_df', 'statemachine/transition', 'transition_df']) and phone ucb-sdb-ios-4
Dumping key background/battery for range with keys dict_keys(['trip_id', 'trip_id_base', 'trip_run', 'start_ts', 'end_ts', 'duration', 'eval_common_trip_id', 'eval_role', 'eval_role_base', 'eval_role_run', 'evaluation_trip_ranges', 'background/battery', 'battery_df', 'background/location', 'background/filtered_location', 'location_df', 'filtered_location_df', 'background/motion_activity', 'motion_activity_df', 'statemachine/transition', 'transition_df']) and phone ucb-sdb-ios-4
which is consistent with what we see in the directories
$ ls data/ucb-sdb-android-2/unimodal_trip_car_bike_mtv_la/
background~battery background~location manual~evaluation_transition
background~filtered_location background~motion_activity statemachine~transition
For the analysed data, we don't need to dump top-level data since we will already have it from the raw phone view.
Instead, we only need to dump the three additional keys that we add in the analysed view: location_entries
, sensed_trip_ranges
and sensed_section_ranges
High level question: we can have different trips and different sections - e.g. cleaned versus confirmed, cleaned versus inferred, etc. How do we store these and read them in the phone view?
analysis/confirmed_trip
) to be consistent with the raw data. In this case, the tag can be the branch - e.g. master
vs. gis
location_entries
) to be consistent with the current analysed phone view. In this case, the tag must be the branch and the combo (e.g. master_cleaned_cleaned
or master_confirmed_inferred
) or ...It is pretty clear that Option 1 is the better option for downloading, although Option 2 might be the better option for loading the data into the analysed phone view.
So we will not use the phone view for downloading, but just download directly using the server spec.
ok, so here's another problem. The analysed phone view currently reads all the trips and sections because of https://github.com/MobilityNet/mobilitynet.github.io/issues/31#issuecomment-1341931561 and then copies over the subset ranges.
This has the problem that when it queries, it queries for the range from the start of the evaluation to now.
However, when the filespec reads data, it reads from the start to the end, and that end will be now
, which means that it won't work.
There are several possible options around using -1
for now and so on, but given that this was a hack to avoid the limitation with the datastreams that we have now addressed anyway, I think we should fix it the right way.
Here's another issue - the start time (after extrapolation) of a trip could be before the evaluation range started. In the analysed_view, we add in a threshold of THIRTY_MINUTES for the matching; let's do the same while downloading in the dump script.
Should we store based on the actual range start and end or the padded by THIRTY_MINUTES start and end? Let's store with the padded values to keep the meaning of the files the same.
Running the pipeline for individual users to monitor it better.
While running it with ucb-sdb-android-1
, we get a lot of the following errors. This is not a huge issues (yet) since we do not plan to download these as part of the analysis results. But we should really think about how to unify the emeval zephyr code into master. git submodule
? python package?
Got error No module named 'emission.net.usercache.formatters.android.evaluation_transition' while saving entry AttrDict({'_id': ObjectId('5d00bc68b88f219ca051064f'), 'metadata': {'key': 'manual/evaluation_transition', 'platform': 'android', 'read_ts': 0, 'time_zone': 'America/Los_Angeles', 'type': 'message', 'write_ts': 1560329319}, 'user_id': UUID('6a2dbafd-ef1e-404c-b61e-506b8935dca4'), 'data': {'transition': 'START_CALIBRATION_PERIOD', 'trip_id': 'high_accuracy_stationary', 'spec_id': 'sfba_trial_3', 'device_manufacturer': 'motorola', 'device_model': 'Nexus 6', 'device_version': '6.0.1', 'ts': 1560329319}}) -> None
We end up with 11 trips, the last of which is at 2019-07-12T22:51:12
. I guess this is because we didn't duty cycle after that? Yup!
2022-12-09 06:34:40,954:DEBUG:4591250944:filter_accuracy disabled, early return
Wait
>>> pd.json_normalize(list(edb.get_timeseries_db().find({"user_id": UUID("6a2dbafd-ef1e-404c-b61e-506b8935dca4"), "metadata.key": "background/filtered_location"}).sort("data.ts", -1).limit(3)))[["data.fmt_time"]]
data.fmt_time
0 2019-07-12T15:51:43-07:00
1 2019-07-12T15:51:11-07:00
2 2019-07-12T15:50:39.055000-07:00
but
>>> pd.json_normalize(list(edb.get_timeseries_db().find({"user_id": UUID("6a2dbafd-ef1e-404c-b61e-506b8935dca4"), "metadata.key": "background/location"}).sort("data.ts", -1).limit(3)))[["data.fmt_time"]]
data.fmt_time
0 2019-07-28T17:16:38-07:00
1 2019-07-28T17:16:37-07:00
2 2019-07-28T17:16:36-07:00
Maybe we need to run it again because there is so much incoming data.
Yup!
>>> pd.json_normalize(list(edb.get_usercache_db().find({"user_id": UUID("6a2dbafd-ef1e-404c-b61e-506b8935dca4"), "metadata.key": "background/location"}).sort("data.ts", -1).limit(3)))[["data.fmt_time"]]
data.fmt_time
0 Mar 4, 2020 5:42:43 PM
1 Mar 4, 2020 5:42:42 PM
2 Mar 4, 2020 5:42:41 PM
After running it multiple times, we still have a few entries left in the usercache (https://github.com/e-mission/e-mission-docs/issues/761)
>>> pd.json_normalize(list(edb.get_usercache_db().find({"user_id": UUID("6a2dbafd-ef1e-404c-b61e-506b8935dca4"), "metadata.key": "background/location"}).sort("data.ts", -1)))[["data.fmt_time"]]
data.fmt_time
0 Nov 22, 2019 6:40:05 PM
1 Nov 22, 2019 6:40:04 PM
2 Nov 22, 2019 6:40:04 PM
3 Nov 22, 2019 6:40:03 PM
4 Nov 22, 2019 6:40:02 PM
5 Nov 22, 2019 6:40:02 PM
6 Nov 22, 2019 6:40:01 PM
7 Nov 22, 2019 6:40:01 PM
8 Jul 28, 2019 5:16:42 PM
9 Jul 28, 2019 5:16:41 PM
10 Jul 28, 2019 5:16:40 PM
11 Jul 28, 2019 5:16:39 PM
After running the master pipeline, trying to download data, but ran into an issue where we were still trying to read the spec from the server to determine whether the input spec was valid.
$ python dump_data_to_file.py --spec-id unimodal_trip_car_bike_mtv_la analysed master_9b70c97 --raw_dir data
Retrieving data for: post_body={'user': 'shankari@eecs.berkeley.edu', 'key_list': ['config/evaluation_spec'], 'key_time': 'metadata.write_ts', 'start_time': 0, 'end_time': 9223372036854775807}
response=<Response [200]>
Found 0 entries
Traceback (most recent call last):
File "dump_data_to_file.py", line 265, in <module>
assert args.spec_id in spec_ids,\
AssertionError: spec_id `unimodal_trip_car_bike_mtv_la` not found within current datastore instance
Tried to change that to read from the local server by calling retrieve_data
on the raw_dir FileSpecDetails
instead, but it requires a CURR_SPEC_ID
.
fsd = eisd.FileSpecDetails(args.raw_dir, args.author_email)
fsd.retrieve_data(args.author_email, "config/evaluation_spec", 0, sys.maxsize)
Traceback (most recent call last):
File "dump_data_to_file.py", line 265, in <module>
args.func(args)
File "dump_data_to_file.py", line 62, in download_analysed
fsd.retrieve_data(args.author_email, "config/evaluation_spec", 0, sys.maxsize)
File "../emeval/input/spec_details.py", line 189, in retrieve_data
f"{user}/{self.CURR_SPEC_ID}/{key.replace('/', '~')}/{math.floor(start_ts)}_{math.ceil(end_ts)}.json")
AttributeError: 'FileSpecDetails' object has no attribute 'CURR_SPEC_ID'
Moving the retrieve all specs call into the spec details; if this doesn't work, will require the specid instead.
Downloaded details for the unimodal spec correctly; moving on to the other specs
Note the ranges for the individual phones are slightly different; even for the non-control phones. I am not 100% sure why that is happening, but it is consistent for both the raw and analysed data. Might want to take a look to see why.
$ ls -al bin/data/ucb-sdb-android-2/unimodal_trip_car_bike_mtv_la/background~location/
1601494 Dec 8 21:25 1564274304_1564282403.json
1550345 Dec 8 21:25 1564334125_1564343116.json
1805219 Dec 8 21:25 1564351292_1564360116.json
2156696 Dec 8 21:25 1565571044_1565578987.json
1804446 Dec 8 21:25 1567271214_1567279428.json
1825668 Dec 8 21:25 1567288623_1567297358.json
$ ls -al bin/data/master_9b70c97/ucb-sdb-android-2/unimodal_trip_car_bike_mtv_la/analysis~recreated_location/
total 1584
118190 Dec 10 19:02 1564272504_1564284203.json
86702 Dec 10 19:02 1564332325_1564344916.json
111861 Dec 10 19:02 1564349492_1564361916.json
128537 Dec 10 19:02 1565569244_1565580787.json
115558 Dec 10 19:02 1567269414_1567281228.json
110241 Dec 10 19:02 1567286823_1567299158.json
$ ls -al bin/data/ucb-sdb-android-3/unimodal_trip_car_bike_mtv_la/background~location/
total 11448
82102 Dec 8 21:25 1564274288_1564282424.json
102737 Dec 8 21:25 1564334097_1564343026.json
75994 Dec 8 21:25 1564351277_1564360135.json
1883423 Dec 8 21:25 1565571018_1565578933.json
1863435 Dec 8 21:25 1567271178_1567279373.json
1841009 Dec 8 21:25 1567288638_1567297395.json
$ ls -al bin/data/master_9b70c97/ucb-sdb-android-3/unimodal_trip_car_bike_mtv_la/analysis~recreated_location/
total 1608
116713 Dec 10 19:02 1564272488_1564284224.json
113150 Dec 10 19:02 1564332297_1564344826.json
122121 Dec 10 19:02 1564349477_1564361935.json
155807 Dec 10 19:02 1565569218_1565580733.json
112924 Dec 10 19:02 1567269378_1567281173.json
105991 Dec 10 19:02 1567286838_1567299195.json
Pulled all the results, and now trying to run the Evaluation*analysis_master
.
CURR_SPEC_ID
and other spec entries to be filled in so that the phone view et al can work. Concretely, the CURR_SPEC_ID
shows up in the path to the dumped filepopulate_spec_details
With these changes, we are able to load the analysed view for each phone view.
but while running the notebook, there is an error because there are no matching sensed segments for
2019-07-27T19:20:31.060968-07:00 -> 2019-07-27T19:20:57.402429-07:00
[]
Found no sensed segments, early return
[]
or
2019-07-24T16:37:07.746717-07:00 -> 2019-07-24T16:41:54.618997-07:00
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-47-aa44bf45f9ff> in <module>
----> 1 check_outlier(av_ucb.map()['ios']['ucb-sdb-ios-3']["evaluation_ranges"][0], 2, "walk to the bikeshare location_0", "WALKING")
<ipython-input-41-820e61dbfbd7> in check_outlier(eval_range, trip_idx, section_id, base_mode)
8 eval_section = [s for s in eval_trip["evaluation_section_ranges"] if s["trip_id"] == section_id][0]
9 print(fmt(eval_section["start_ts"]), "->", fmt(eval_section["end_ts"]))
---> 10 print([(fmt(ssr["start_ts"]), fmt(ssr["end_ts"]), ssr["mode"]) for ssr in eval_trip["sensed_section_ranges"]])
11 matching_section_map = embs.find_matching_segments(eval_trip["evaluation_section_ranges"], "trip_id", eval_trip["sensed_section_ranges"])
12 sensed_section_range = matching_section_map[section_id]["match"]
<ipython-input-41-820e61dbfbd7> in <listcomp>(.0)
8 eval_section = [s for s in eval_trip["evaluation_section_ranges"] if s["trip_id"] == section_id][0]
9 print(fmt(eval_section["start_ts"]), "->", fmt(eval_section["end_ts"]))
---> 10 print([(fmt(ssr["start_ts"]), fmt(ssr["end_ts"]), ssr["mode"]) for ssr in eval_trip["sensed_section_ranges"]])
11 matching_section_map = embs.find_matching_segments(eval_trip["evaluation_section_ranges"], "trip_id", eval_trip["sensed_section_ranges"])
12 sensed_section_range = matching_section_map[section_id]["match"]
KeyError: 'start_ts'
Tried to pull up the version with the prior results, but don't see any outputs for these cells in the checked-in version. But all the last tests are failing with
KeyError Traceback (most recent call last)
<ipython-input-49-cf4b53f0c6bf> in <module>
----> 1 check_outlier(pv_la.map()['android']['ucb-sdb-android-3']["evaluation_ranges"][0], 0, "walk_start_0", "WALKING")
<ipython-input-41-820e61dbfbd7> in check_outlier(eval_range, trip_idx, section_id, base_mode)
8 eval_section = [s for s in eval_trip["evaluation_section_ranges"] if s["trip_id"] == section_id][0]
9 print(fmt(eval_section["start_ts"]), "->", fmt(eval_section["end_ts"]))
---> 10 print([(fmt(ssr["start_ts"]), fmt(ssr["end_ts"]), ssr["mode"]) for ssr in eval_trip["sensed_section_ranges"]])
11 matching_section_map = embs.find_matching_segments(eval_trip["evaluation_section_ranges"], "trip_id", eval_trip["sensed_section_ranges"])
12 sensed_section_range = matching_section_map[section_id]["match"]
KeyError: 'sensed_section_ranges'
Just need to debug by hand.
Let's start with the lack of matches for
2019-07-27T19:20:31.060968-07:00 -> 2019-07-27T19:20:57.402429-07:00
The corresponding evaluation range
>>> arrow.get(range_0["start_ts"]).to("America/Los_angeles"), arrow.get(range_0["end_ts"]).to("America/Los_angeles")
(<Arrow [2019-07-27T17:38:24.968000-07:00]>,
<Arrow [2019-07-27T19:53:22.886000-07:00]>)
has two trips
2019-07-27T17:38:54.143985-07:00 2019-07-27T17:54:56.504297-07:00
2019-07-27T18:59:17.435039-07:00 2019-07-27T19:20:57.464819-07:00
and each trip has three sections
2019-07-27T17:38:54.143985-07:00 2019-07-27T17:54:56.504297-07:00
------- 2019-07-27T17:38:54.192643-07:00 2019-07-27T17:40:03.303200-07:00
------- 2019-07-27T17:40:03.318182-07:00 2019-07-27T17:52:26.823849-07:00
------- 2019-07-27T17:52:26.843096-07:00 2019-07-27T17:54:56.450234-07:00
2019-07-27T18:59:17.435039-07:00 2019-07-27T19:20:57.464819-07:00
------- 2019-07-27T18:59:17.495898-07:00 2019-07-27T19:01:06.611826-07:00
------- 2019-07-27T19:01:06.626976-07:00 2019-07-27T19:20:31.044772-07:00
------- 2019-07-27T19:20:31.060968-07:00 2019-07-27T19:20:57.402429-07:00
ok, so now let's see how the matching works.
There are two sensed trips and 9 sensed sections
2019-07-27T17:42:38.727000-07:00 2019-07-27T17:51:30-07:00
2019-07-27T19:03:05.796040-07:00 2019-07-27T19:21:19-07:00
=======
2019-07-27T17:42:38.727000-07:00 2019-07-27T17:51:10-07:00
2019-07-27T17:51:11-07:00 2019-07-27T17:51:30-07:00
2019-07-27T19:03:05.796040-07:00 2019-07-27T19:08:26-07:00
2019-07-27T19:08:27-07:00 2019-07-27T19:08:35-07:00
2019-07-27T19:08:36-07:00 2019-07-27T19:12:34-07:00
2019-07-27T19:12:35-07:00 2019-07-27T19:12:48-07:00
2019-07-27T19:12:49-07:00 2019-07-27T19:17:09-07:00
2019-07-27T19:17:10-07:00 2019-07-27T19:17:47-07:00
2019-07-27T19:17:49-07:00 2019-07-27T19:21:19-07:00
And there are matched sections for the evaluated trip ranges
2019-07-27T17:38:54.143985-07:00 2019-07-27T17:54:56.504297-07:00
------- 2019-07-27T17:42:38.727000-07:00 2019-07-27T17:51:10-07:00
------- 2019-07-27T17:51:11-07:00 2019-07-27T17:51:30-07:00
2019-07-27T18:59:17.435039-07:00 2019-07-27T19:20:57.464819-07:00
------- 2019-07-27T19:03:05.796040-07:00 2019-07-27T19:08:26-07:00
------- 2019-07-27T19:08:27-07:00 2019-07-27T19:08:35-07:00
------- 2019-07-27T19:08:36-07:00 2019-07-27T19:12:34-07:00
------- 2019-07-27T19:12:35-07:00 2019-07-27T19:12:48-07:00
------- 2019-07-27T19:12:49-07:00 2019-07-27T19:17:09-07:00
------- 2019-07-27T19:17:10-07:00 2019-07-27T19:17:47-07:00
------- 2019-07-27T19:17:49-07:00 2019-07-27T19:21:19-07:00
One issue is that the sensed_ranges all have data
in them, while the evaluation ranges do not
range_0 = av_la.map()["android"]["ucb-sdb-android-2"]["evaluation_ranges"][0]
for t in range_0["sensed_trip_ranges"]:
print(arrow.get(t["data"]["start_ts"]).to("America/Los_angeles"), arrow.get(t["data"]["end_ts"]).to("America/Los_angeles"))
print("=======")
for s in range_0["sensed_section_ranges"]:
print(arrow.get(s["data"]["start_ts"]).to("America/Los_angeles"), arrow.get(s["data"]["end_ts"]).to("America/Los_angeles"))
print("=======")
for t in range_0["evaluation_trip_ranges"]:
print(arrow.get(t["start_ts"]).to("America/Los_angeles"), arrow.get(t["end_ts"]).to("America/Los_angeles"))
for s in t["sensed_section_ranges"]:
print("-------", arrow.get(s["data"]["start_ts"]).to("America/Los_angeles"), arrow.get(s["data"]["end_ts"]).to("America/Los_angeles"))
But that should result in the matching failing with start_ts
not found, not with missing section matches.
Ah, that's because this does work on android but apparently not on iOS. There are no sensed trip or section ranges
=======
=======
2019-07-27T17:38:54.143985-07:00 2019-07-27T17:54:56.504297-07:00
2019-07-27T18:59:17.435039-07:00 2019-07-27T19:20:57.464819-07:00
iOS2 works, but not iOS3
2019-07-27T17:38:54.143985-07:00 2019-07-27T17:54:56.504297-07:00
------- 2019-07-27T17:48:26.003052-07:00 2019-07-27T17:55:14.984543-07:00
2019-07-27T18:59:17.435039-07:00 2019-07-27T19:20:57.464819-07:00
------- 2019-07-27T19:02:21.000540-07:00 2019-07-27T19:04:49.996204-07:00
------- 2019-07-27T19:04:50.996161-07:00 2019-07-27T19:05:06.995687-07:00
------- 2019-07-27T19:05:07.995770-07:00 2019-07-27T19:12:40.996040-07:00
------- 2019-07-27T19:12:41.996006-07:00 2019-07-27T19:13:02.995287-07:00
------- 2019-07-27T19:13:03.995252-07:00 2019-07-27T19:14:55.991418-07:00
------- 2019-07-27T19:14:56.991384-07:00 2019-07-27T19:18:07.999237-07:00
------- 2019-07-27T19:18:08.999212-07:00 2019-07-27T19:18:24.998764-07:00
------- 2019-07-27T19:18:25.998734-07:00 2019-07-27T19:18:54.997776-07:00
------- 2019-07-27T19:18:55.997741-07:00 2019-07-27T19:19:48.995914-07:00
------- 2019-07-27T19:19:53.995742-07:00 2019-07-27T19:21:41.992005-07:00
For this first range, there are apparently no matching trips?!
ucb-sdb-ios-3 evaluation_1 dict_keys(['role', 'manual/evaluation_transition', 'calibration_transitions', 'calibration_ranges', 'evaluation_transitions', 'evaluation_ranges'])
==============================
HAHFDC v/s MAHFDC:MAHFDC_0 HAHFDC v/s MAHFDC MAHFDC_0 2
Before filtering, trips = []
Filter range = 2019-07-27T17:38:54.143985-07:00 -> 2019-07-27T17:54:56.504297-07:00
After filtering, trips = []
Before filtering, trips = []
Filter range = 2019-07-27T18:59:17.435039-07:00 -> 2019-07-27T19:20:57.464819-07:00
After filtering, trips = []
==============================
HAHFDC v/s MAHFDC:MAHFDC_1 HAHFDC v/s MAHFDC MAHFDC_1 2
Before filtering, trips = [('2019-07-28T10:23:13.947510-07:00', '2019-07-28T10:31:48.066216-07:00'), ('2019-07-28T10:31:54.494439-07:00', '2019-07-28T10:34:30.450632-07:00'), ('2019-07-28T11:50:42.000985-07:00', '2019-07-28T12:10:40.324661-07:00')]
Filter range = 2019-07-28T10:19:03.776588-07:00 -> 2019-07-28T10:32:24.080722-07:00
After filtering, trips = ['2019-07-28T10:23:13.947510-07:00', '2019-07-28T10:31:54.494439-07:00']
Before filtering, trips = [('2019-07-28T10:23:13.947510-07:00', '2019-07-28T10:31:48.066216-07:00'), ('2019-07-28T10:31:54.494439-07:00', '2019-07-28T10:34:30.450632-07:00'), ('2019-07-28T11:50:42.000985-07:00', '2019-07-28T12:10:40.324661-07:00')]
Filter range = 2019-07-28T11:48:06.675345-07:00 -> 2019-07-28T12:09:44.829831-07:00
After filtering, trips = ['2019-07-28T11:50:42.000985-07:00']
There are raw trips and sections but no cleaned or confirmed trips for this phone and trip combo
2B Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~cleaned_section/1564272465_1564284123.json
2B Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~cleaned_trip/1564272465_1564284123.json
2B Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~cleaned_untracked/1564272465_1564284123.json
2B Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~confirmed_trip/1564272465_1564284123.json
2B Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~inferred_section/1564272465_1564284123.json
72K Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~recreated_location/1564272465_1564284123.json
4.4K Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/segmentation~raw_section/1564272465_1564284123.json
4.6K Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/segmentation~raw_trip/1564272465_1564284123.json
2B Dec 10 19:35 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/segmentation~raw_untracked/1564272465_1564284123.json
ok, so for the trip to the location, it is very short and we basically have no data
2022-12-11 19:24:32,988:DEBUG:4396191232:Considering trip 63969ea3b10cfb28033e534d: 2019-07-27T17:58:13.428633-07:00 -> 2019-07-27T17:58:16.703112-07:00
...
2022-12-11 19:24:53,396:INFO:4396191232:Skipped single point trip 63969ea3b10cfb28033e534d (2019-07-27T17:58:13.428633-07:00 -> 2019-07-27T17:58:16.703112-07:00) of length 1.773354991055583
2022-12-11 19:24:53,396:DEBUG:4396191232:For raw trip 63969ea3b10cfb28033e534d, found filtered trip None
But how about the trip back?
2022-12-11 19:24:32,988:DEBUG:4396191232:Considering trip 63969ea3b10cfb28033e5351: 2019-07-27T19:04:36.508539-07:00 -> 2019-07-27T19:21:37.427862-07:00
2022-12-11 19:24:53,781:DEBUG:4396191232:Starting with element of type trip, id 63969f05b10cfb28033e66dc, details Entry({'_id': ObjectId('63969f05b10cfb28033e66dc'), 'user_id': UUID('7ed80490-6853-433d-9d20-838fe4d3d71b'), 'metadata': Metadata({'key': 'analysis/cleaned_section'}, 'data': Cleanedsection({'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('63969f05b10cfb28033e66da'), 'start_ts': 1564279335.4815264, 'start_fmt_time': '2019-07-27T19:02:15.481526-07:00', 'start_loc': {'type': 'Point', 'coordinates':[-122.11348540560869, 37.38088791613373]}, 'end_ts': 1564280497.4278622, 'end_fmt_time': '2019-07-27T19:21:37.427862-07:00', 'end_loc': {'type': 'Point', 'coordinates': [-122.08372039576801, 37.390345769893756]}, 'duration': 1161.9463357925415, 'distance': 3705.457082358938, 'sensed_mode': 7})})
2022-12-11 19:24:53,791:DEBUG:4396191232:For raw trip 63969ea3b10cfb28033e5351, found filtered trip 63969f05b10cfb28033e66da
2022-12-11 19:25:43,721:DEBUG:4396191232:fix_squished_place: Fixed trip object = Cleanedtrip({'source': 'DwellSegmentationDistFilter', 'end_ts': 1564280497.4278622, 'end_fmt_time': '2019-07-27T19:21:37.427862-07:00', 'start_ts': 1564279305.4815264, 'start_fmt_time': '2019-07-27T19:01:45.481526-07:00', 'duration': 1191.9463357925415, 'distance': 3707.2304373499933})
Also
Inserting entry Entry({'user_id': UUID('7ed80490-6853-433d-9d20-838fe4d3d71b'), 'metadata': {'key': 'analysis/inferred_section'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('63969f05b10cfb28033e66da'), 'start_ts': 1564279305.4815264, 'start_local_dt': {'year': 2019, 'month': 7, 'day': 27, 'hour': 19, 'minute': 1, 'second': 45, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2019-07-27T19:01:45.481526-07:00', 'start_loc': {'type': 'Point', 'coordinates': [-122.1134678931847, 37.380895707027186]}, 'end_ts': 1564280497.4278622, 'end_fmt_time': '2019-07-27T19:21:37.427862-07:00', 'duration': 1191.9463357925415,
So we do have the trips and sections; why are we not retrieving them?
Two observations:
There are two queries for the segmentation/raw_trip
but saved in one file
Dumping key segmentation/raw_trip for key_time = data.start_ts and phone ucb-sdb-ios-3
original range = 2019-07-27T17:37:45.212364-07:00 -> 2019-07-27T19:52:02.549677-07:00,padded range = 2019-07-2
7T17:07:45.212364-07:00 -> 2019-07-27T20:22:02.549677-07:00
Retrieving data for ucb-sdb-ios-3 from 1564272465.212364 -> 1564284122.549677
Retrieving data for: post_body={'user': 'ucb-sdb-ios-3', 'key_list': ['segmentation/raw_trip'], 'key_time': 'data.start_ts', 'start_time': 1564272465.212364, 'end_time': 1564284122.549677}
response=<Response [200]>
Found 2 entries
Retrieving data for ucb-sdb-ios-3 from 1670815395.355776 -> 1564284122.549677
Retrieving data for: post_body={'user': 'ucb-sdb-ios-3', 'key_list': ['segmentation/raw_trip'], 'key_time': 'data.start_ts', 'start_time': 1670815395.355776, 'end_time': 1564284122.549677}
response=<Response [200]>
Found 0 entries
Creating out_file='data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/segmentation~raw_trip/15642
72465_1564284123.json'...
There's one entry for cleaned trip
Dumping key analysis/cleaned_trip for key_time = data.start_ts and phone ucb-sdb-ios-3
original range = 2019-07-27T17:37:45.212364-07:00 -> 2019-07-27T19:52:02.549677-07:00,padded range = 2019-07-27T17:07:45.212364-07:00 -> 2019-07-27T20:22:02.549677-07:00
Retrieving data for ucb-sdb-ios-3 from 1564272465.212364 -> 1564284122.549677
Retrieving data for: post_body={'user': 'ucb-sdb-ios-3', 'key_list': ['analysis/cleaned_trip'], 'key_time': 'data.start_ts', 'start_time': 1564272465.212364, 'end_time': 1564284122.549677}
response=<Response [200]>
Found 1 entries
Creating out_file='data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~cleaned_trip/1564272465_1564284123.json'...
But that file has no data
ls -alh data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~cleaned_trip/1564272465_1564284123.json
2B Dec 11 22:00 data/master_9b70c97/ucb-sdb-ios-3/unimodal_trip_car_bike_mtv_la/analysis~cleaned_trip/1564272465_1564284123.json
so the two calls (including one where the start > end is because we continue reading until we have zero or one entry, and we pick the next batch as starting from the metadata.write_ts of the final entry in the batch. To be consistent with https://github.com/MobilityNet/mobilitynet.github.io/issues/31#issuecomment-1343805591 we need to set the second batch to start from the key_time of the last entry in the first batch.
The reason that the one entry is not saved is because of a very stupid bug that seems to have been around forever. If we only ever get one batch (e.g. only get exactly one entry), then location_entries
is never added to. I guess we haven't hit this before because it is unlikely that we retrieve only one entry at a time.
https://github.com/MobilityNet/mobilitynet-analysis-scripts/blob/master/emeval/input/spec_details.py#L160
wrt https://github.com/MobilityNet/mobilitynet.github.io/issues/31#issuecomment-1345790435
One issue is that the sensed_ranges all have
data
in them, while the evaluation ranges do not
It looks like we do expect to have ['data'] - and we do in the check_outlier_expanded
print([(fmt(ssr["data"]["start_ts"]), fmt(ssr["data"]["end_ts"]), ssr["data"]["mode"])
for ssr in eval_trip["sensed_section_ranges"]])
Ah but if we fix that, we run into another issue with the key
~/e-mission/mobilitynet-analysis-scripts/emeval/metrics/baseline_segmentation.py in find_matching_segments(gt_segments, id_key, sensed_segments)
80 (len(gt_segments), len(sensed_segments)))
81 for gt in gt_segments:
---> 82 start_segment_idx = find_closest_segment_idx(gt, sensed_segments, "start_ts")
83 # We want to find the end segment id in the segments after the
84 # start segment. So we filter the array passed in, and add back the
~/e-mission/mobilitynet-analysis-scripts/emeval/metrics/baseline_segmentation.py in find_closest_segment_idx(gt, sensed_segments, key)
48
49 def find_closest_segment_idx(gt, sensed_segments, key):
---> 50 ts_diffs = [abs(gt[key] - st[key]) for st in sensed_segments]
51 # import arrow
52 # print("diffs for %s %s = %s" % (key, arrow.get(gt[key]).to("America/Los_Angeles"), ts_diffs))
~/e-mission/mobilitynet-analysis-scripts/emeval/metrics/baseline_segmentation.py in <listcomp>(.0)
48
49 def find_closest_segment_idx(gt, sensed_segments, key):
---> 50 ts_diffs = [abs(gt[key] - st[key]) for st in sensed_segments]
51 # import arrow
52 # print("diffs for %s %s = %s" % (key, arrow.get(gt[key]).to("America/Los_Angeles"), ts_diffs))
KeyError: 'start_ts'
I then double-checked Evaluate_power_vs_classification
, which works on transition matches and it also has a check_outlier
which does not have a data
.
I also checked the new classification code which is the main part that we need to get to work, and it has the fallback
if 'data' in ss.keys():
ss = ss['data']
So let's just strip out the data while creating the analysed timeline. If there are still issues, we can stop here, verify that the analysed timeline is working properly and move on to getting the notebooks for the paper done.
Right now, the raw data can be read from a ServerSpec or a FileSpec. It would help to be able to read the analysed data also from a FileSpec.
People have been dealing with this through pickling, but this is sub-optimal because: 1) pickling is a binary format 2) pickling format can change 3) reading from the pickled values requires importing pymongo
Instead, let's store the results in files as well.
The analysis results are currently read as follows:
create_analysed_view(input_view, analysis_datastore, location_key, trip_key, section_key)
when we try to run it with
we get the error
This is because we set the analysis datastore to the input spec
but we now have a mismatch between the spec, which is a filespec, and the analysis_datastore, which is a serverspec.
Our other requirement is that we need to support multiple possible analysed views for various versions of the algorithms - so one for
master
and one forgis_branch
for example.We need to refactor the analysis view code as follows:
This seems to suggest the following changes:
create_analysed_view(input_view, analysis_spec, location_key, trip_key, section_key)
bin/dump_data_to_file.py
to also retrieve and dump analysed data