P0: first displayed place seems to collect additions from all other places

shankari commented 1 year ago

@JGreenlee, please see attached screenshots

First place has all the matches	Second place

Third place	Fourth place
------	-----

JGreenlee commented 1 year ago

I can't help but notice that the first place in question displays an end time of 11:59. I don't think that's a coincidence.

Is it possible that this place does not have an exit time? If so, then it would skip endChecks and match to any addition after its enter time.

JGreenlee commented 1 year ago

Confirmed the above suspicion. Trying to investigate why that is the case.

JGreenlee commented 1 year ago

@shankari I think I need you to investigate because I don't have access to the staging DB to run analysis.

It seems that this confirmed_place has no exit_ts, despite not being the last place. Why? Does the cleaned_place have an exit_ts?

JGreenlee commented 1 year ago

At the time that we run the pipeline, in the CREATE_CONFIRMED_OBJECTS stage, the most recent place has no exit_ts. We mark the stage as completed with the last place's enter_ts.

On the next pipeline run, we query using last_processed_ts as the start of the next query.

Do we ever go back to fill in the exit_ts of the previous place?

shankari commented 1 year ago

so for the CLEAN_AND_RESAMPLE stage, which is the other stage where we process a mixture of places and trips, we also mark the last_processed_ts with the enter_ts of the last place. So at least that is consistent.

def mark_clean_resampling_done(user_id, last_section_done):
    if last_section_done is None:
        mark_stage_done(user_id, ps.PipelineStages.CLEAN_RESAMPLING, None)
    else:
        mark_stage_done(user_id, ps.PipelineStages.CLEAN_RESAMPLING,
                        last_section_done.data.enter_ts + END_FUZZ_AVOID_LTE)

shankari commented 1 year ago

Do we ever go back to fill in the exit_ts of the previous place?

For the cleaned place/trip, we do (in link_trip_start)

Not sure if you do so for the confirmed places and/or the composite trips

JGreenlee commented 1 year ago

After a pipeline stage is marked completed, the last_processed_ts is recorded.

On the next pipeline run, the last_processed_ts is 5 seconds later than was recorded. Why?

(In other words, why do we need END_FUZZ_AVOID_LTE?)

shankari commented 1 year ago

Double checking this against the actual data from database. The first entry that is retrieved is

_id: "642e4556ac2aec7db6dcce56"
cleaned_trip: $oid: "642e445cac2aec7db6dcc367"
confirmed_place: $oid: "642e4515ac2aec7db6dcccf6"
end_confirmed_place: _id: {$oid: "642e4515ac2aec7db6dcccf6"}
         cleaned_place: {$oid: "642e447bac2aec7db6dcc539"}
         ending_trip: {$oid: "642e445cac2aec7db6dcc367"}
         enter_fmt_time: "2023-04-04T18:44:26.137000-07:00"
         enter_ts: 1680659066.137
         key: "analysis/confirmed_place"
         raw_places: [{$oid: "642e3cebac2aec7db6dc7dd9"}, , …] (22)
end_place: $oid: "642e447bac2aec7db6dcc539"
start_place: $oid: "642e447bac2aec7db6dcc538"

The related confirmed place now (in the database)

{'_id': ObjectId('642e4515ac2aec7db6dcccf6'),
'write_fmt_time': '2023-04-05T21:03:07.080342-07:00'
'enter_fmt_time': '2023-04-04T18:44:26.137000-07:00', 
'raw_places': [ObjectId('642e3cebac2aec7db6dc7dd9'), ... ObjectId('642e3cf0ac2aec7db6dc7e01')],
'ending_trip': ObjectId('642e445cac2aec7db6dcc367'),
'cleaned_place': ObjectId('642e447bac2aec7db6dcc539'),
'user_input': {}, 'additions': []}}

The related cleaned place

{'_id': ObjectId('642e447bac2aec7db6dcc539'),
metadata': {'key': 'analysis/cleaned_place', 
'write_fmt_time': '2023-04-05T21:03:07.080342-07:00'},
'enter_fmt_time': '2023-04-04T18:44:26.137000-07:00', 
'raw_places': [ObjectId('642e3cebac2aec7db6dc7dd9'), ...ObjectId('642e3cf0ac2aec7db6dc7e03')],
'ending_trip': ObjectId('642e445cac2aec7db6dcc367'),
'starting_trip': ObjectId('642f6f0f071330001746036a'),
'exit_fmt_time': '2023-04-06T16:54:09.469119-07:00',
'duration': 166183.33211874962}}

JGreenlee commented 1 year ago

Do we ever go back to fill in the exit_ts of the previous place?

For the cleaned place/trip, we do (in link_trip_start)

Not sure if you do so for the confirmed places and/or the composite trips

I think to implement this properly, I need confirmed trips to have start_confirmed_place. We should flesh out the linking between confirmed objects to the same extent that cleaned objects are linked.

Since Sebastian is already working on that task, I can collaborate with him on it to expedite the resolution of this critical issue.

shankari commented 1 year ago

(In other words, why do we need END_FUZZ_AVOID_LTE?)

I don't remember the details now but I know that I wrote a really long commit message when I did. We can look at the blame to figure it out. But I don't think this is the underlying issue.

I think to implement this properly, I need confirmed trips to have start_confirmed_place. We should flesh out the linking between confirmed objects to the same extent that cleaned objects are linked.

Yes, while processing a trip, you would need to find the place before it and "complete it" with the trip information.

shankari commented 1 year ago

@JGreenlee I don't think we actually need confirmed trips to have start_confirmed_place. The change is even simpler; while processing confirmed objects, if the first cleaned place object has the fields filled in, copy them over.

I can then also fix the other fields (e.g. write_fmt_time and ending_trip which should be different between cleaned and confirmed trips.

I will fix right now and reset the pipelines before heading out for the weekend.

JGreenlee commented 1 year ago

@shankari If I understand correctly, that approach won't work with the FUZZ (5 seconds added) when marking the confirmed object creation stage as completed, which is why I was asking about it

Without the fuzz, the last place of pipeline run #1 should be the first place of run #2.

shankari commented 1 year ago

with the fuzz, you use timeline.fill_start_end_places to fill in the first and last place of the timeline as needed. We use it in emission//analysis/intake/cleaning/clean_and_resample.py, save_cleaned_segments_for_ts for example

shankari commented 1 year ago

there are a couple of options of dealing with this:

we handle the first place in the timeline separately, continue to walk through the timeline, and then re-link all the objects at the end so that the confirmed trips are linked to the confirmed places and so on
we iterate over the timeline trips ONLY, create the confirmed trip, update the start place and create the end place

Let's see what we do in the CLEAN_AND_RESAMPLE code since we know that works

shankari commented 1 year ago

in CLEAN_AND_RESAMPLE, (save_cleaned_segments_for_timeline) we:

map raw -> filtered trips
(last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
update last cleaned place (not relevant any more since we don't squish at the confirmed trips level)
bulk insert the filtered timeline

what do we do in create_and_link_timeline is (2) - iterate over trips; update the cleaned start place and create a new cleaned end place

I am pretty sure I can get this to work with approach (1) as well, and it even seems cleaner, but given the current timeframe, going with the tried and true here.

So high level pseudocode is:

find last confirmed place
if it doesn't exist, this is the beginning of the timeline for the user, create it
iterate over all trips
- update the previous confirmed place
- create a new confirmed place for the next cleaned place
- return a new timeline of the confirmed places and confirmed trips

shankari commented 1 year ago

one challenge with following the approach above is that we have cleaned_untracked objects, but not confirmed_untracked objects

shankari commented 1 year ago

I am not even sure how we can avoid confirmed_untracked objects. What will the starting and confirmed end place of the untracked time link to? It seems cleaner to create a new confirmed_untracked and link it to the confirmed timeline; which will also support labels for the untracked time down the road

Until now the place links were not updated, so it was moot. But having the links not be updated was incorrect.

Let's add confirmed_untracked and fix all the links properly.

shankari commented 1 year ago

Ok I think I am basically done EXCEPT for thinking about the user input matching for the first trip in the timeline for each run of the pipeline. Currently, the user input matching happens in the create_confirmed_entry. But the first place in the timeline was created on the last run (and was the last place then).

will all the trip addition inputs match that last place on the server as well?
- if so, we will need to re-match when we update
- if not, presumably the initial input matching step will match to the existing object

Let's see how the place matching works.

shankari commented 1 year ago

ok, this is even worse. I took a trip with the test phone today and of course all the inputs got pushed up to the server, and now they have all disappeared. There is something seriously broken wrt server side trip addition processing. @sebastianbarry have you noticed this as well?

Place still ends at midnight	No matching additions	No matching additions

shankari commented 1 year ago

investigating briefly tonight

we received a bunch of data from the phone

2023-04-08 01:11:07,619:DEBUG:139892232456000:Returning multi_result.inserted_ids = [ObjectId('6430bcef80ea0c19db61646b'), ObjectId('6430bcef80ea0c19db61646c'), ObjectId('6430bcef80ea0c57f112c3b4'), ObjectId('6430bcef80ea0c57f112c3b5'), ObjectId('6430bcef80ea0c19db61646d'), ObjectId('6430bcef80ea0c19db61646e'), ObjectId('6430bcef80ea0c57f112c3b6'), ObjectId('6430bcef80ea0c57f112c3b7'), ObjectId('6430bcef80ea0c57f112c3b8'), ObjectId('6430bcef80ea0c57f112c3b9')]... of length 292

we tried to process them and got 5 user inputs

2023-04-08 01:11:07,662:DEBUG:139892232456000:finished querying values for ['manual/mode_confirm', 'manual/purpose_confirm', 'manual/replaced_mode', 'manual/trip_user_input', 'manual/trip_addition_input', 'manual/place_addition_input'], count = 5

there was a match for at least the first user input

2023-04-08 01:11:07,981:DEBUG:139892232456000:Comparing user input 1 Voluntary Work, : 2023-04-06T17:46:33.983000-07:00 -> 2023-04-06T17:47:46.695139-07:00, trip 2023-04-06T17:46:33.983000-07:00 -> 2023-04-06T17:47:46.695139-07:00, start checks are (True && True) and end checks are (True || True)

2023-04-08 01:11:07,984:DEBUG:139892232456000:sorted candidates are [{'write_fmt_time': '2023-04-06T18:17:11.788968-07:00', 'detail': '2023-04-06T17:46:33.983000-07:00'}]

2023-04-08 01:11:07,984:DEBUG:139892232456000:most recent entry is 2023-04-06T18:17:11.788968-07:00, 2023-04-06T17:46:33.983000-07:00

2023-04-08 01:11:07,985:DEBUG:139892232456000:Saving entry Entry({'_id': ObjectId('642f6f230713300017460414'), 
'metadata': {'key': 'analysis/confirmed_place', 'write_fmt_time': '2023-04-06T18:17:11.788968-07:00'}, '
data': {'source': 'DwellSegmentationTimeFilter',
'enter_fmt_time': '2023-04-06T17:46:33.983000-07:00',
'exit_fmt_time': '2023-04-06T17:47:46.695139-07:00', 
'additions': [Entry({'_id': ObjectId('6430bcef80ea0c57f112c3c9'),
     'metadata': {'key': 'manual/place_addition_input', 
     ]}}) into timeseries

Similarly, we save

Saving entry Entry({'_id': ObjectId('642f6f230713300017460416'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2023-04-06T17:54:03.967000-07:00'
'exit_fmt_time': '2023-04-06T17:56:44.991424-07:00',
'additions': [Entry({'_id': ObjectId('6430bcef80ea0c57f112c3ca'),

and

Saving entry Entry({'_id': ObjectId('642f6f230713300017460418'), 
'metadata': {'key': 'analysis/confirmed_place', 
'write_fmt_time': '2023-04-06T18:17:11.792570-07:00'
'enter_fmt_time': '2023-04-06T18:07:08.985000-07:00',
'additions': [Entry({'_id': ObjectId('6430bcef80ea0c57f112c3cb'),

shankari commented 1 year ago

Ah of course; we copy over the confirmed place into the composite trip, but we don't recreate it when the confirmed place changes with the addition of matches.

edb.get_analysis_timeseries_db().find_one({"_id": boi.ObjectId("642f6f230713300017460416")})
{'_id': ObjectId('642f6f230713300017460416'),
'additions': [{'_id': ObjectId('6430bcef80ea0c57f112c3ca'),

But

>>> edb.get_analysis_timeseries_db().find_one({"data.end_confirmed_place._id": boi.ObjectId("642f6f230713300017460416")})

{'_id': ObjectId('642f6f25071330001746041c'),
'metadata': {'key': 'analysis/composite_trip',
'origin_key': 'analysis/confirmed_trip'
'end_confirmed_place': {'_id': ObjectId('642f6f230713300017460416'),
    'metadata': {'key': 'analysis/confirmed_place', 
    'data': {'enter_fmt_time': '2023-04-06T17:54:03.967000-07:00',
                'exit_fmt_time': '2023-04-06T17:56:44.991424-07:00',
                 duration': 161.02442359924316,
                'cleaned_place': ObjectId('642f6f170713300017460402'),
                'user_input': {}, 'additions': []}}}

So we will need to update the related composite trip as well when we update the confirmed place

shankari commented 1 year ago

wrt

will all the trip addition inputs match that last place on the server as well?

Certainly seems like it from the code.

    if start_checks and not end_checks:
        logging.debug("Handling corner case where start check matches, but end check does not")
        next_entry_obj = _get_next_cleaned_timeline_entry(ts, tl_entry)
        if next_entry_obj is not None:
            next_entry_end = end_of(next_entry_obj)
            if next_entry_end is None: # the last place will not have an exit_ts
                end_checks = True # so we will just skip the end check
            else:

Let's test it, and then finish up that fix

shankari commented 1 year ago

While testing, found another issue. In https://github.com/shankari/e-mission-server/blob/add_trip_place_additions/emission/analysis/plotting/composite_trip_creation.py#L17, we added a hack to "fill in" the confirmed place for confirmed trips.

The hack was arguably incorrect to begin with and is now even more incorrect.

there is no confirmed_place key in confirmed_trip https://github.com/shankari/e-mission-server/blob/add_trip_place_additions/emission/core/wrapper/confirmedtrip.py so we shouldn't have tried to use it in the first place
the general expectation is that the start_place and end_place should be filled in with the correct objects for the appropriate timeline. For example, raw trips will have raw start/end places, clean trips will have clean start/end places and confirmed trips will have confirmed start/end places
we want to have both start and end confirmed place in the composite trip anyway

However, there was also a weirdness in confirmed_trip from the previous implementation in that we created confirmed_trips by copying over from the corresponding cleaned_trip, so the start_place and the end_place are filled in, but point to the corresponding cleaned trips instead.

>>> edb.get_analysis_timeseries_db().find_one({"metadata.key": "analysis/confirmed_trip"})
{'_id': ObjectId('64341826ebf368c478d5482e'),
'metadata': {'key': 'analysis/confirmed_trip',
'data': {'start_place': ObjectId('64337da27423ef6528092f28'), 'end_place': ObjectId('64341826ebf368c478d54825'),
'user_input': {}, 'trip_addition': []}}

>>> edb.get_analysis_timeseries_db().find_one({"_id": boi.ObjectId("64341826ebf368c478d54825")})
{'_id': ObjectId('64341826ebf368c478d54825'),
'metadata': {'key': 'analysis/cleaned_place')}}

So we can change the hack to check whether the start place/end place are cleaned places and replace them with the corresponding confirmed place instead.

That will require a DB call to determine whether the place is of the correct type, though which I would like to avoid. Is there a simple check in the format of the confirmed_trip object that we can use instead?

Here's a potential hack:

if we don't have additions or trip_addition: This was created from an old build (before place matches), so they will be cleaned places
we have trip_addition but not additions. This was created from an old build (before place matches), so they will be cleaned places. Fix and remove the trip_addition field so we don't run into this again.
we have additions. This was created from a new build, should already be correct. Ignore.

shankari commented 1 year ago

Testing done:

Loaded data from 2016-08-04

Ran the pipeline without any inputs

last confirmed place is created without exit_ts and with no matches

Entry({'_id': ObjectId('643427314fcc9197e202a0ad'),
'metadata': {'key': 'analysis/confirmed_place',
'data': {'enter_fmt_time': '2016-08-04T16:38:38.348000-10:00',
'user_input': {}, 'additions': []}})

Add user inputs, including some from the next few days

Push the inputs

END 2023-04-10 22:24:27.100300 POST /usercache/put 2f012dd4-7b47-43aa-b38f-3d0c6d6e8f3c 4.096529960632324

Re-run the pipeline. The last place still doesn't have an exit_ts, but does have a bunch of additions

>>> last_confirmed_place = esdp.get_last_place_entry("analysis/confirmed_place", test_uuid)
>>> last_confirmed_place
Entry({'_id': ObjectId('643427314fcc9197e202a0ad'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-04T16:38:38.348000-10:00',
'additions': [...]

>>> len(last_confirmed_place['data']['additions'])
5

Reload from the UI, we still see the same places

Loaded data from 2016-08-05
Re-run the pipeline

The original last place now has an exit_ts

{'_id': ObjectId('643427314fcc9197e202a0ad'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-04T16:38:38.348000-10:00',
'exit_fmt_time': '2016-08-04T16:38:38.348000-10:00',
starting_trip: ObjectId('6434f12ba6c4a675cc9eb77f')

But it still has 5 additions

>>> len(orig_last_place['data']['additions'])
5

New last confirmed place does not have exit_ts, does not have additions

>>> last_confirmed_place = esdp.get_last_place_entry("analysis/confirmed_place", test_uuid)
Entry({'_id': ObjectId('6434f12ba6c4a675cc9eb784'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-05T17:25:52.895000-07:00',
'additions': []

The starting place for that original last trip was some untracked time, and it matches one of the inputs. Note that this input has not been removed from the previous last confirmed place.

{'_id': ObjectId('6434f12ba6c4a675cc9eb77f')
'metadata': {'key': 'analysis/confirmed_untracked',
'start_fmt_time': '2016-08-04T16:38:38.348000-10:00'
'end_fmt_time': '2016-08-05T04:53:24.886000-07:00'
additions: [{'metadata': 'key': 'manual/place_addition_input',}]

The end place for the untracked time is

'_id': ObjectId('6434f12ba6c4a675cc9eb780'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-05T04:53:24.886000-07:00'
'exit_fmt_time': '2016-08-05T05:11:41.493000-07:00'
'duration': 1096.6070001125336
'starting_trip': ObjectId('6434f12ba6c4a675cc9eb781'),
'additions': []

The next trip is below. Seems like it should have matched at least the breakfast "personal care"; probably didn't match because it was a place input.

{'_id': ObjectId('6434f12ba6c4a675cc9eb781'),
'end_fmt_time': '2016-08-05T08:48:09-07:00'
'start_fmt_time': '2016-08-05T05:11:41.493000-07:00'
'end_place': ObjectId('6434f12ba6c4a675cc9eb782'),
'additions': []

The next place also doesn't match any additions because the place addition starts an hour before the place starts.

{'_id': ObjectId('6434f12ba6c4a675cc9eb782'),
'metadata': {'key': 'analysis/confirmed_place'
'enter_fmt_time': '2016-08-05T08:48:09-07:00'
'exit_fmt_time': '2016-08-05T17:21:35.725313-07:00'
'additions': []

Final UI screenshots:

All entries are still here, except that the display time on the last two entries has changed due to the timezone change	Duplicate entries in a later place	Untracked time overlaps with the next place

Pending issues/bugs:

we need to recompute the matches when we re-run the pipeline (so yes to https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1501376640). This will avoid the duplicates between the two final UI screenshots.
We need to show matches for untracked time. Since the user can change the times arbitrarily in the time use survey, they can use the current trip and place buttons to specify time ranges that overlap with untracked time. So we need to then be able to show those matches in the correct slot.
There is an overlap between untracked time and the next place?

Matching-related questions:

should we ensure that we only match trip-based time-use with trips and place-based time-use with places? If a user clicks "add" on a trip but then edits the survey to overlap with a place, what do we do?
what should the definition of "overlap" be? if an activity overlaps the end of a trip and the start of a place, where do we place it?
if people add future entries, the display times can change if the timezone changes between days (e.g. on a redeye flight)
- should we even allow addition of future entries or generate an error if a timestamp is in the future?

shankari commented 1 year ago

Thanks to a suggestion from @JGreenlee, changed the expected key for untracked time to fix the overlap. Also added the enketo-notes-list to the untracked item directive and it automagically worked after changing all the variables passed in

            <enketo-notes-list ng-if="triplike.additionsList.length" timeline-entry="triplike" addition-entries="triplike.additionsList"></enketo-notes-list>

Gives me hope for the more modular future!

shankari commented 1 year ago

Next, we need to fix https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1500799602 Not sure why that didn't show up in the previous reproduction when we reload from the UI, the composite trip should have lost the matching places.

Let's see why that didn't happen.

shankari commented 1 year ago

Loaded data for the 4th
Added three place additions
Pushed them to the server
Re-ran pipeline

We have three inputs

>>> all_inputs = list(ts.find_entries(["manual/place_addition_input"]))
>>> len(all_inputs)
3

They are matched to the places

>>> all_places = list(ts.find_entries(["analysis/confirmed_place"]))
>>> pd.json_normalize(all_places)["data.additions"]
0                                                   []
1                                                   []
2                                                   []
3                                                   []
4                                                   []
5                                                   []
6    [{'_id': 64358bf0f5622167bcba53c8, 'user_id': ...
7    [{'_id': 64358bf0f5622167bcba533e, 'user_id': ...
8    [{'_id': 64358bf0f5622167bcba527c, 'user_id': ...
9                                                   []

But don't show up in the confirmed trips

>>> all_ct = list(ts.find_entries(["analysis/composite_trip"]))
>>> pd.json_normalize(all_ct)["data.end_confirmed_place.data.additions"]
0    []
1    []
2    []
3    []
4    []
5    []
6    []
7    []
8    []

But they do still show up on reload in the UI Let's see why

It's because the manual result map has three entries. And that is because we are retrieving them remotely.

[Log] DEBUG:About to dedup localResult = 0remoteResult = 3 (cordova.js, line 1413)
[Log] DEBUG:Deduped list = 3 (cordova.js, line 1413)

Why are we still retrieving them remotely although they have been processed? Ah, I think it is because of the mismatch between the time the data was collected and the time that it was labeled. We collected the data in 2016, so the pipeline range is

{'user_id': UUID('3b5121f7-32b0-41c1-ae63-c7e2fd5e3e43'), '$or': [{'metadata.key': 'manual/place_addition_input'}], 'metadata.write_ts': {'$lte': 1681231955, '$gte': 1470364708.348}}

which is from 2016 to 2023

>>> arrow.get(1470364708.348).to("America/Los_Angeles")
<Arrow [2016-08-04T19:38:28.348000-07:00]>

>>> arrow.get(1681231955).to("America/Los_Angeles")
<Arrow [2023-04-11T09:52:35-07:00]>

In a normal pipeline, the pipeline would move ahead, so we would not keep getting the entries. That is when they will disappear.

shankari commented 1 year ago

To fix https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1500799602, we need to ensure that whenever we update the confirmed_place object, we should also update the corresponding composite trip (f any). This should happen for any update, not just the user input or addtion; when we update the timestamps of the last place, that should also be reflected in the composite trip, for example.

Before running the pipeline a second time

>>> pd.json_normalize(all_ct)["data.end_confirmed_place.data.exit_fmt_time"]
0    2016-08-04T10:41:32.136385-10:00
1    2016-08-04T13:10:38.739684-10:00
2    2016-08-04T13:40:36.959000-10:00
3    2016-08-04T13:46:26.561801-10:00
4    2016-08-04T14:06:52.592000-10:00
5    2016-08-04T14:18:35.840464-10:00
6    2016-08-04T14:39:38.288795-10:00
7    2016-08-04T16:34:45.744782-10:00
8                                 NaN

Load data from the 5th, re-run pipeline

>>> all_places = list(ts.find_entries(["analysis/confirmed_place"]))
>>> pd.json_normalize(all_places)["data.exit_fmt_time"]
0     2016-08-04T10:03:51.235000-10:00
1     2016-08-04T10:41:32.136385-10:00
2     2016-08-04T13:10:38.739684-10:00
3     2016-08-04T13:40:36.959000-10:00
4     2016-08-04T13:46:26.561801-10:00
5     2016-08-04T14:06:52.592000-10:00
6     2016-08-04T14:18:35.840464-10:00
7     2016-08-04T14:39:38.288795-10:00
8     2016-08-04T16:34:45.744782-10:00
9     2016-08-04T16:38:38.348000-10:00
10    2016-08-05T05:11:41.493000-07:00
11    2016-08-05T17:21:35.725313-07:00
12                                 NaN

>>> pd.json_normalize(all_ct)["data.end_confirmed_place.data.exit_fmt_time"]
0     2016-08-04T10:41:32.136385-10:00
1     2016-08-04T13:10:38.739684-10:00
2     2016-08-04T13:40:36.959000-10:00
3     2016-08-04T13:46:26.561801-10:00
4     2016-08-04T14:06:52.592000-10:00
5     2016-08-04T14:18:35.840464-10:00
6     2016-08-04T14:39:38.288795-10:00
7     2016-08-04T16:34:45.744782-10:00
8                                  NaN
9                                  NaN
10    2016-08-05T17:21:35.725313-07:00
11                                 NaN

shankari commented 1 year ago

While writing the automated tests, I ran into the issue that the trips don't seem to show inputs (either user_input or additions).

I first thought that this was a server issue, but everything seemed to be working fine on the server - the confirmed trips were updated, and then the composite trips were updated. But then I noticed that the timestamps on the server were different from the timestamps on the phone, aka I couldn't match up the entries saved on the server with the values displayed on the phone.

So I retrieved the data and added a breakpoint and they are still not matching up!

Here is the set of trips that have matching inputs

Here's the visualization around that time frame on the phone. Note that none of the timestamps match up! There are no trips that start between 13:46 and 14:18, and in fact, there is a gap in the timeline right there.

shankari commented 1 year ago

Ok so I looked through the retrieved list Very Carefully, and it is clear that trips with inputs are not displayed.

ctList: Array (9)
0 {end_loc: {type: "Point", coordinates: [-155.0397394, 19.6218661]}, source: "DwellSegmentationTimeFilter", start_loc: {type: "Point", coordinates: [-154.9029399, 19.5461465]}, user_input: {}, duration: 1880.7650001049042, …}
1 {end_loc: {type: "Point", coordinates: [-155.9108361, 19.4228153]}, source: "DwellSegmentationTimeFilter", start_loc: {type: "Point", coordinates: [-155.0397394, 19.6218661]}, user_input: {}, duration: 8056.584614753723, …}
2 {end_loc: {type: "Point", coordinates: [-155.9109609, 19.4213896]}, source: "DwellSegmentationTimeFilter", start_loc: {type: "Point", coordinates: [-155.9108361, 19.4228153]}, user_input: {}, duration: 605.260315656662, …}
3 Object <----- displayed
_id: "64373344796eba348c2149dc"
additions: [] (0)
end_fmt_time: "2016-08-04T13:42:52.709000-10:00"
start_fmt_time: "2016-08-04T13:40:36.959000-10:00"
user_input: {}

4 Object <--------- not displayed
_id: "64373344796eba348c2149dd"
additions: [Object] (1)
end_fmt_time: "2016-08-04T13:58:52-10:00"
start_fmt_time: "2016-08-04T13:46:26.561801-10:00"
user_input: {trip_user_input: Object}

Trips before and after the set that I labeled are displayed. This is probably a UI fix...

shankari commented 1 year ago

Doh! This is the "To Label" screen, so trips with labels have moved to "All Trips". Maybe we should change this functionality for ENKETO instead of MULTILABEL.

shankari commented 1 year ago

Unit test results

No inputs	All inputs matched to last place

Spread out inputs 1	Spread out inputs 2

Trip inputs matching 1	Trip inputs matching 2

shankari commented 1 year ago

Final test is for the hack to fill in the confirmed places.

most of the non-recently deployed containers are at tag join_redirect_to_static_2023-01-25--40-39 which is from Jan 25, so fairly close to https://github.com/shankari/e-mission-server/commit/fcf5a9c57ae978428fa2e923b3253f34cc46012b
the recently deployed containers are at tag trip_place_additions_master_2023_03_10_1 which is from Mar 10, so fairly close to https://github.com/shankari/e-mission-server/commit/c51aee943f82cce80960f2cfa58d1023115c1ca6

So the checks are:

git checkout to the first commit, run the pipeline, checkout to the latest, run the test (may need to be manual so we can use the same UUID)
git checkout to the second commit, repeat

shankari commented 1 year ago

Checked out to the first commit and ran the pipeline

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
0
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
0

Checked to the most recent and re-ran pipeline

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
18
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

Why do we have 18 places? We should have 10

Ah, its because we create start and end places for each trip. But the end place of one trip is the start place of the next. We should handle that properly in the hack

shankari commented 1 year ago

Fixed by checking to see if there was a related confirmed trip before creating one

Now we have

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
10
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

And the output matches our test case (which runs only the new code)

>>> with open(dataFile_1+".before-user-inputs.expected_composite_trips") as expectation:
...     expected_trips = json.load(expectation, object_hook = bju.object_hook)
...     print(len(composite_trips), len(expected_trips))
...     for i in range(len(composite_trips)):
...             print(composite_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"], expected_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"])
...
9 9
2016-08-04T10:35:12-10:00 2016-08-04T10:35:12-10:00
2016-08-04T12:55:48.721000-10:00 2016-08-04T12:55:48.721000-10:00
2016-08-04T13:20:44-10:00 2016-08-04T13:20:44-10:00
2016-08-04T13:42:52.709000-10:00 2016-08-04T13:42:52.709000-10:00
2016-08-04T13:58:52-10:00 2016-08-04T13:58:52-10:00
2016-08-04T14:12:04.251000-10:00 2016-08-04T14:12:04.251000-10:00
2016-08-04T14:34:12.571000-10:00 2016-08-04T14:34:12.571000-10:00
2016-08-04T16:18:54.709000-10:00 2016-08-04T16:18:54.709000-10:00
2016-08-04T16:38:38.348000-10:00 2016-08-04T16:38:38.348000-10:00

shankari commented 1 year ago

Checked out the second commit and ran the pipeline

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

re-running ends up with the same values

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

shankari commented 1 year ago

That's because the entries have addition instead of trip_addition Although I checked, and the production servers do have trip_addition Let's double check the commit by launching a container with the image

It is

$ docker run -it shankari/e-mission-server:trip_place_additions_master_2023_03_10_1 /bin/bash
# git log
commit f64f5be9d31acdd0cb5454eb08057b6b8b7c6a6b (HEAD -> add_trip_place_additions_new_master)
Merge: c806bcb4 1a374276
Author: Shankari <shankari@eecs.berkeley.edu>
Date:   Fri Mar 10 20:37:00 2023 -0800

    Merge branch 'master' of https://github.com/e-mission/e-mission-server into add_trip_place_additions_new_master

commit 1a374276d35acc1e07ad50915cc7cb670afb65a3 (upstream/master)
Merge: 5b839e11 4492edff
Author: shankari <shankari@eecs.berkeley.edu>
Date:   Fri Mar 10 20:35:26 2023 -0800

    Merge pull request #902 from swastis10/server_upgrade

    Configuring PYTHON_LEGACY UUID representation in cfc_webapp.py

...

commit 9aed65d5948e61ba284e5a3ee29b4f9bc2b81290 (origin/add_trip_place_additions, add_trip_place_additions)
Author: Shankari <shankari@eecs.berkeley.edu>
Date:   Thu Mar 9 18:08:29 2023 -0800

    :bug: Read the match ID from 'data' instead of directly from the entry

    To be consistent with what the phone is actually sending
    + also fix all the test cases

    Testing done:

$ ./e-mission-py.bash emission//tests/analysisTests/userInputTests/TestUserInputFakeData.py
----------------------------------------------------------------------
Ran 6 tests in 0.196s

OK
```

This fixes
https://github.com/e-mission/e-mission-docs/issues/861

shankari commented 1 year ago

Ok so using commit 9aed65d5948e61ba284e5a3ee29b4f9bc2b81290 instead, we get

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
0
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
0

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
10
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

And the resulting values are consistent

>>> with open(dataFile_1+".before-user-inputs.expected_composite_trips") as expectation:
...     expected_trips = json.load(expectation, object_hook = bju.object_hook)
...     print(len(composite_trips), len(expected_trips))
...     for i in range(len(composite_trips)):
...             print(composite_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"], expected_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"])
...
9 9
2016-08-04T10:35:12-10:00 2016-08-04T10:35:12-10:00
2016-08-04T12:55:48.721000-10:00 2016-08-04T12:55:48.721000-10:00
2016-08-04T13:20:44-10:00 2016-08-04T13:20:44-10:00
2016-08-04T13:42:52.709000-10:00 2016-08-04T13:42:52.709000-10:00
2016-08-04T13:58:52-10:00 2016-08-04T13:58:52-10:00
2016-08-04T14:12:04.251000-10:00 2016-08-04T14:12:04.251000-10:00
2016-08-04T14:34:12.571000-10:00 2016-08-04T14:34:12.571000-10:00
2016-08-04T16:18:54.709000-10:00 2016-08-04T16:18:54.709000-10:00
2016-08-04T16:38:38.348000-10:00 2016-08-04T16:38:38.348000-10:00

It's a wrap!

shankari commented 1 year ago

Checked display on both android and iOS

shankari commented 1 year ago

Created child issues:

Time use labeling questions: https://github.com/e-mission/e-mission-docs/issues/886
Untracked time before/after place: https://github.com/e-mission/e-mission-docs/issues/887

shankari commented 1 year ago

Closed by: https://github.com/shankari/e-mission-server/pull/13 and https://github.com/e-mission/e-mission-phone/pull/950

shankari commented 1 year ago

I spent all day filling in time entries and they all disappeared when I took a trip in the evening.

Missing additions	Place additions work	Trip additions work

shankari commented 1 year ago

This is in the last 5 trips.

>>> last_5_list = list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/composite_trip"}).sort("data.start_ts", -1).limit(5))
>>> import pandas as pd
>>> last_5_df = pd.json_normalize(last_5_list)
>>> last_5_df["data.start_fmt_time"]
0    2023-04-14T20:16:35.992728-07:00
1    2023-04-14T19:47:31.480693-07:00
2    2023-04-14T18:48:22.422325-07:00
3    2023-04-13T20:10:58.404540-07:00
4    2023-04-13T19:31:59.983000-07:00

And it does not have any additions

>>> last_5_df.iloc[3]["_id"]
ObjectId('6438d316e2fd7ac823955632')
>>> last_5_df.iloc[3]["data.start_fmt_time"]
'2023-04-13T20:10:58.404540-07:00'
>>> last_5_df.iloc[3]["data.end_fmt_time"]
'2023-04-13T20:22:52.978000-07:00'
>>> last_5_df.iloc[3]["data.end_confirmed_place.data.additions"]
[]

The related confirmed place object is as below and it doesn't have additions either. why didn't it the additions match?

>>> edb.get_analysis_timeseries_db().find_one({"_id": cpeid})
{'_id': ObjectId('6438d313e2fd7ac82395562b'),
 'metadata': {'key': 'analysis/confirmed_place',
'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'},
'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00',
'enter_ts': 1681442572.978
'user_input': {}, 
'additions': [],
'exit_fmt_time': '2023-04-14T18:48:22.422325-07:00',
'exit_ts': 1681523302.4223254,
}}

Hm. there are three entries with the same enter_ts

>>> edb.get_analysis_timeseries_db().count_documents({"data.enter_ts": 1681442572.978})
3

And they are the places in the three timelines

>>> edb.get_analysis_timeseries_db().find({"data.enter_ts": 1681442572.978}).distinct("metadata.key")
['analysis/cleaned_place', 'analysis/confirmed_place', 'segmentation/raw_place']

shankari commented 1 year ago

We do find the last confirmed place and set it into the composite trip, but it has zero additions

2023-04-14T21:14:24.689-07:00 | 2023-04-15  04:14:24,689:DEBUG:140635200833344:last confirmed_place  6438d313e2fd7ac82395562b was already in database, updating with linked  trip info... and 0 additions

looking further upstream...

2023-04-15 04:14:24,338:DEBUG:140635200833344:Found existing last confirmed place, setting exit information to 2023-04-14T18:48:22.422325-07:00, and trimming additions to 0

We need to find the match incoming step for it. It looks like we do save an entry with additions earlier

2023-04-15 04:12:28,649:DEBUG:140635200833344:Saving entry Entry({'_id': ObjectId('6438d313e2fd7ac82395562b'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'analysis/confirmed_place', 'platform': 'server', 'write_ts': 1681445651.4299612, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 14, 'second': 11, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'enter_ts': 1681442572.978, 'enter_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'location': {'type': 'Point', 'coordinates': [-122.0863953, 37.391031]}, 'raw_places': [ObjectId('6438d2efe2fd7ac823955574')], 'ending_trip': ObjectId('6438d313e2fd7ac82395562a'), 'cleaned_place': ObjectId('6438d30be2fd7ac823955607'), 'user_input': {}, 'additions': [Entry({'_id': ObjectId('643a1a4080ea0c46334769f1'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'manual/place_addition_input', 'platform': 'android', 'read_ts': 0, 'time_zone': 'America/Los_Angeles', 'type': 'message', 'write_ts': 1681446697.777, 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 31, 'second': 37, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:31:37.777000-07:00'}, 'data': {'label': '1 Domestic, ', 'name': 'TimeUseSurvey', 'version': 9, 'xmlResponse': '<a88RxBtE3jwSar3cwiZTdn xmlns:jr="http://openrosa.org/javarosa" xmlns:orx="http://openrosa.org/xforms" id="a88RxBtE3jwSar3cwiZTdn">\n          <start>2023-04-13T21:31:04.570-07:00</start>\n          <end>2023-04-13T21:31:04.570-07:00</end>\n          <group_hg4zz25>\n            <Date>2023-04-13</Date>\n            <Start_time>20:22:52.978-07:00</Start_time>\n            <End_time>21:00:00.000-07:00</End_time>\n            <Activity_Type>domestic_activities</Activity_Type>\n            <Personal_Care_activities/>\n            <Employment_related_a_Education_activities/>\n            <Domestic_activities>preparing_meals_or_snacks</Domestic_activities>\n            <Recreation_and_leisure/>\n            <Voluntary_work_and_care_activities/>\n            <Other/>\n          </group_hg4zz25>\n          <meta>\n            <instanceID>uuid:2c4df962-617d-424a-839f-75f4f3147226</instanceID>\n          </meta>\n        </a88RxBtE3jwSar3cwiZTdn>', 'jsonDocResponse': {'a88RxBtE3jwSar3cwiZTdn': {'attr': {'xmlns:jr': 'http://openrosa.org/javarosa', 'xmlns:orx': 'http://openrosa.org/xforms', 'id': 'a88RxBtE3jwSar3cwiZTdn'}, 'start': '2023-04-13T21:31:04.570-07:00', 'end': '2023-04-13T21:31:04.570-07:00', 'group_hg4zz25': {'attr': {}, 'Date': '2023-04-13', 'Start_time': '20:22:52.978-07:00', 'End_time': '21:00:00.000-07:00', 'Activity_Type': 'domestic_activities', 'Personal_Care_activities': '', 'Employment_related_a_Education_activities': '', 'Domestic_activities': 'preparing_meals_or_snacks', 'Recreation_and_leisure': '', 'Voluntary_work_and_care_activities': '', 'Other': ''}, 'meta': {'attr': {}, 'instanceID': 'uuid:2c4df962-617d-424a-839f-75f4f3147226'}}}, 'start_ts': 1681442572.978, 'end_ts': 1681444800, 'match_id': '30427f9d-8faf-4448-bdbb-50378daaf644', 'start_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'end_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 0, 'second': 0, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'end_fmt_time': '2023-04-13T21:00:00-07:00'}})]}}) into timeseries

After the user input matching, we end up with

2023-04-15 04:12:34,053:DEBUG:140635200833344:Saving entry Entry({'_id': ObjectId('6438d313e2fd7ac82395562b'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'analysis/confirmed_place', 'platform': 'server', 'write_ts': 1681445651.4299612, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 14, 'second': 11, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'enter_ts': 1681442572.978, 'enter_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'location': {'type': 'Point', 'coordinates': [-122.0863953, 37.391031]}, 'raw_places': [ObjectId('6438d2efe2fd7ac823955574')], 'ending_trip': ObjectId('6438d313e2fd7ac82395562a'), 'cleaned_place': ObjectId('6438d30be2fd7ac823955607'), 'user_input': {}, 'additions': [{}, {}, {}, {}, {}, {}, {}, {}, {}, Entry()]}}) into timeseries

So why do all those additions get deleted later?

shankari commented 1 year ago

In CREATE_CONFIRMED_OBJECTS, when we read the last place doc, it has all the additions, so it is not our read-after-write inconsistency

2023-04-15 04:14:24,088:DEBUG:140635200833344:last place doc = {'_id': ObjectId('6438d313e2fd7ac82395562b'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'analysis/confirmed_place', 'platform': 'server', 'write_ts': 1681445651.4299612, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 14, 'second': 11, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'enter_ts': 1681442572.978, 'enter_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'location': {'type': 'Point', 'coordinates': [-122.0863953, 37.391031]}, 'raw_places': [ObjectId('6438d2efe2fd7ac823955574')], 'ending_trip': ObjectId('6438d313e2fd7ac82395562a'), 'cleaned_place': ObjectId('6438d30be2fd7ac823955607'), 'user_input': {}, 'additions': [, , , , , , , , , , , , , , ]}}

We then try to find matches but there are none, so we trim the additions to zero.

2023-04-15 04:14:24,310:DEBUG:140635200833344:curr_query = {'user_id': UUID(...'), '$or': [{'metadata.key': 'manual/trip_addition_input'}, {'metadata.key': 'manual/place_addition_input'}], 'data.enter_ts': {'$lte': 1681523302.4223254, '$gte': 1681442572.978}}, sort_key = data.enter_ts
2023-04-15 04:14:24,326:DEBUG:140635200833344:finished querying values for ['manual/trip_addition_input', 'manual/place_addition_input'], count = 0
2023-04-15 04:14:24,327:DEBUG:140635200833344:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
2023-04-15 04:14:24,337:DEBUG:140635200833344:in get_not_deleted_candidates, no candidates, returning []
2023-04-15 04:14:24,338:DEBUG:140635200833344:Found existing last confirmed place, setting exit information to 2023-04-14T18:48:22.422325-07:00, and trimming additions to 0

Why are there no matches? What do the place additions look like?

{'_id': ObjectId('643a1a4080ea0c46334769f1'), 'user_id': UUID('...'),
'metadata': {'key': 'manual/place_addition_input',
'data': 'meta': {'attr': {}, 'instanceID': 'uuid:2c4df962-617d-424a-839f-75f4f3147226'}}},
'start_ts': 1681442572.978, 'end_ts': 1681444800,
'match_id': '30427f9d-8faf-4448-bdbb-50378daaf644',
'start_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'},
'start_fmt_time': '2023-04-13T20:22:52.978000-07:00',
'end_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 0, 'second': 0, 'weekday': 3, 'timezone': 'America/Los_Angeles'},
'end_fmt_time': '2023-04-13T21:00:00-07:00'}}

It has start_ts and end_ts, not enter_ts and exit_ts.

The composite place creation code calls get_additions_for_timeline_entry_object which in turn calls

def get_time_query_for_timeline_entry(timeline_entry):
    begin_of_entry = begin_of(timeline_entry)
    end_of_entry = end_of(timeline_entry)
    timeType = "data.start_ts" if "start_ts" in timeline_entry.data else "data.enter_ts"
    if end_of_entry is None:
        # the last place (user's current place) will not have an exit_ts, so
        # every input from its enter_ts onward is fair game
        end_of_entry = EPOCH_MAXIMUM
    return estt.TimeQuery(timeType, begin_of_entry, end_of_entry)

so since this is matching to a place, we search for enter and exit matches while the phone uses start/end for all additions.

So how did this ever work (e.g. in https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1507272460)? Should re-run the test case and see if we can figure it out. It was a zero duration place anyway, so maybe we weren't expecting it to match a lot

Tomorrow:

test to see what actually happens with the automated test and why
fix the query to always use start/end
add new automated test that captures this functionality

shankari commented 1 year ago

Ok I think I understood what happened.

get_additions_for_timeline_entry_object is only called when we have a confirmed object and need to match additions to it. So it is typically not called when we enter values after the confirmed trip/place has already been created. The only times it is called are:

for matching entries that were entered for draft trips
last place matching

We didn't find this before since:

We don't show draft trips in "To Label", so that use case was not tested
In our previous testing, the last place did not have any entries either https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1506229672 but it was zero duration, so we assumed that was the reason.

Basically, the inputs always have start/end ts, and don't match the timeline object, so we should remove the timeline object specific functionality and convert everything to just search for start_ts and end_ts

@JGreenlee @MaliheTabasi for visibility

shankari commented 1 year ago

Confirmed that the addition inputs always have only the start_ts filled in and never the enter_ts

>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/trip_addition_input", "data.start_ts": {"$exists": True}})
111
>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/trip_addition_input", "data.enter_ts": {"$exists": True}})
0
>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/place_addition_input", "data.start_ts": {"$exists": True}})
90
>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/place_addition_input", "data.enter_ts": {"$exists": True}})
0

shankari commented 1 year ago

Reproducing the issue, we have all the inputs match the final place, but they are all deleted after we get more inputs and re-run the pipeline

All the matches with the last place	All entries gone

shankari commented 1 year ago

Couple of quick things to investigate before going on:

are the earlier entries getting matched properly (e.g. trip details and the non-last-place time use)?

There should be matches for the trip (18:45 to 20:22)

             metadata.write_fmt_time  ... data.jsonDocResponse.a88RxBtE3jwSar3cwiZTdn.group_hg4zz25.Start_time
25  2023-04-13T15:18:44.595000-07:00  ...                                 15:29:17.206-07:00
24  2023-04-13T15:19:11.299000-07:00  ...                                 16:30:00.000-07:00
23  2023-04-13T15:21:20.298000-07:00  ...                                 17:30:00.000-07:00
22  2023-04-13T15:21:41.249000-07:00  ...                                 18:29:00.000-07:00
21  2023-04-13T15:22:28.623000-07:00  ...                                 19:29:00.000-07:00
20  2023-04-13T18:11:42.594000-07:00  ...                                                NaN
19  2023-04-13T18:12:24.638000-07:00  ...                                 18:03:29.356-07:00
18  2023-04-13T18:13:10.923000-07:00  ...                                 18:36:59.350-07:00
17  2023-04-13T18:13:49.804000-07:00  ...                                 17:14:04.360-07:00
16  2023-04-13T18:14:06.490000-07:00  ...                                 18:00:29.356-07:00
15  2023-04-13T18:14:32.465000-07:00  ...                                                NaN
14  2023-04-13T18:15:21.962000-07:00  ...                                                NaN
13  2023-04-13T18:15:39.699000-07:00  ...                                                NaN
12  2023-04-13T21:25:48.619000-07:00  ...                                                NaN
11  2023-04-13T21:26:01.901000-07:00  ...                                 18:45:43.402-07:00
10  2023-04-13T21:26:24.069000-07:00  ...                                                NaN
9   2023-04-13T21:26:42.917000-07:00  ...                                 18:58:47.497-07:00
8   2023-04-13T21:27:53.347000-07:00  ...                                 19:37:31.972-07:00
7   2023-04-13T21:28:22.735000-07:00  ...                                                NaN
6   2023-04-13T21:29:39.628000-07:00  ...                                                NaN
5   2023-04-13T21:30:03.779000-07:00  ...                                 19:21:58.659-07:00
4   2023-04-13T21:30:30.951000-07:00  ...                                 19:28:59.983-07:00
3   2023-04-13T21:30:50.064000-07:00  ...                                                NaN
2   2023-04-13T21:31:02.171000-07:00  ...                                 20:10:58.404-07:00
1   2023-04-13T21:31:37.777000-07:00  ...                                 20:22:52.978-07:00
0   2023-04-13T21:32:00.367000-07:00  ...                                 21:00:00.000-07:00

shankari commented 1 year ago

are the earlier entries getting matched properly (e.g. trip details and the non-last-place time use)?

Yes they are. I just needed to load the entries properly in the test case: load entries; setup, load entries, setup. Not load, load, setup

e-mission / e-mission-docs

P0: first displayed place seems to collect additions from all other places #880