e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

P0: first displayed place seems to collect additions from all other places #880

Closed shankari closed 1 year ago

shankari commented 1 year ago

@JGreenlee, please see attached screenshots

First place has all the matches Second place
Screenshot_20230406-194036 Screenshot_20230406-194054
Third place Fourth place
------ -----
Screenshot_20230406-194115 Screenshot_20230406-194158
JGreenlee commented 1 year ago

I can't help but notice that the first place in question displays an end time of 11:59. I don't think that's a coincidence.

Is it possible that this place does not have an exit time? If so, then it would skip endChecks and match to any addition after its enter time.

JGreenlee commented 1 year ago

Confirmed the above suspicion. Trying to investigate why that is the case.

JGreenlee commented 1 year ago

@shankari I think I need you to investigate because I don't have access to the staging DB to run analysis.

It seems that this confirmed_place has no exit_ts, despite not being the last place. Why? Does the cleaned_place have an exit_ts?

JGreenlee commented 1 year ago

At the time that we run the pipeline, in the CREATE_CONFIRMED_OBJECTS stage, the most recent place has no exit_ts. We mark the stage as completed with the last place's enter_ts.

On the next pipeline run, we query using last_processed_ts as the start of the next query.

Do we ever go back to fill in the exit_ts of the previous place?

shankari commented 1 year ago

so for the CLEAN_AND_RESAMPLE stage, which is the other stage where we process a mixture of places and trips, we also mark the last_processed_ts with the enter_ts of the last place. So at least that is consistent.

def mark_clean_resampling_done(user_id, last_section_done):
    if last_section_done is None:
        mark_stage_done(user_id, ps.PipelineStages.CLEAN_RESAMPLING, None)
    else:
        mark_stage_done(user_id, ps.PipelineStages.CLEAN_RESAMPLING,
                        last_section_done.data.enter_ts + END_FUZZ_AVOID_LTE)
shankari commented 1 year ago

Do we ever go back to fill in the exit_ts of the previous place?

For the cleaned place/trip, we do (in link_trip_start)

Not sure if you do so for the confirmed places and/or the composite trips

JGreenlee commented 1 year ago

After a pipeline stage is marked completed, the last_processed_ts is recorded.

On the next pipeline run, the last_processed_ts is 5 seconds later than was recorded. Why?

(In other words, why do we need END_FUZZ_AVOID_LTE?)

shankari commented 1 year ago

Double checking this against the actual data from database. The first entry that is retrieved is

_id: "642e4556ac2aec7db6dcce56"
cleaned_trip: $oid: "642e445cac2aec7db6dcc367"
confirmed_place: $oid: "642e4515ac2aec7db6dcccf6"
end_confirmed_place: _id: {$oid: "642e4515ac2aec7db6dcccf6"}
         cleaned_place: {$oid: "642e447bac2aec7db6dcc539"}
         ending_trip: {$oid: "642e445cac2aec7db6dcc367"}
         enter_fmt_time: "2023-04-04T18:44:26.137000-07:00"
         enter_ts: 1680659066.137
         key: "analysis/confirmed_place"
         raw_places: [{$oid: "642e3cebac2aec7db6dc7dd9"}, , …] (22)
end_place: $oid: "642e447bac2aec7db6dcc539"
start_place: $oid: "642e447bac2aec7db6dcc538"

The related confirmed place now (in the database)

{'_id': ObjectId('642e4515ac2aec7db6dcccf6'),
'write_fmt_time': '2023-04-05T21:03:07.080342-07:00'
'enter_fmt_time': '2023-04-04T18:44:26.137000-07:00', 
'raw_places': [ObjectId('642e3cebac2aec7db6dc7dd9'), ... ObjectId('642e3cf0ac2aec7db6dc7e01')],
'ending_trip': ObjectId('642e445cac2aec7db6dcc367'),
'cleaned_place': ObjectId('642e447bac2aec7db6dcc539'),
'user_input': {}, 'additions': []}}

The related cleaned place

{'_id': ObjectId('642e447bac2aec7db6dcc539'),
metadata': {'key': 'analysis/cleaned_place', 
'write_fmt_time': '2023-04-05T21:03:07.080342-07:00'},
'enter_fmt_time': '2023-04-04T18:44:26.137000-07:00', 
'raw_places': [ObjectId('642e3cebac2aec7db6dc7dd9'), ...ObjectId('642e3cf0ac2aec7db6dc7e03')],
'ending_trip': ObjectId('642e445cac2aec7db6dcc367'),
'starting_trip': ObjectId('642f6f0f071330001746036a'),
'exit_fmt_time': '2023-04-06T16:54:09.469119-07:00',
'duration': 166183.33211874962}}
JGreenlee commented 1 year ago

Do we ever go back to fill in the exit_ts of the previous place?

For the cleaned place/trip, we do (in link_trip_start)

Not sure if you do so for the confirmed places and/or the composite trips

I think to implement this properly, I need confirmed trips to have start_confirmed_place. We should flesh out the linking between confirmed objects to the same extent that cleaned objects are linked.

Since Sebastian is already working on that task, I can collaborate with him on it to expedite the resolution of this critical issue.

shankari commented 1 year ago

(In other words, why do we need END_FUZZ_AVOID_LTE?)

I don't remember the details now but I know that I wrote a really long commit message when I did. We can look at the blame to figure it out. But I don't think this is the underlying issue.

I think to implement this properly, I need confirmed trips to have start_confirmed_place. We should flesh out the linking between confirmed objects to the same extent that cleaned objects are linked.

Yes, while processing a trip, you would need to find the place before it and "complete it" with the trip information.

shankari commented 1 year ago

@JGreenlee I don't think we actually need confirmed trips to have start_confirmed_place. The change is even simpler; while processing confirmed objects, if the first cleaned place object has the fields filled in, copy them over.

I can then also fix the other fields (e.g. write_fmt_time and ending_trip which should be different between cleaned and confirmed trips.

I will fix right now and reset the pipelines before heading out for the weekend.

JGreenlee commented 1 year ago

@shankari If I understand correctly, that approach won't work with the FUZZ (5 seconds added) when marking the confirmed object creation stage as completed, which is why I was asking about it

Without the fuzz, the last place of pipeline run #1 should be the first place of run #2.

shankari commented 1 year ago

with the fuzz, you use timeline.fill_start_end_places to fill in the first and last place of the timeline as needed. We use it in emission//analysis/intake/cleaning/clean_and_resample.py, save_cleaned_segments_for_ts for example

shankari commented 1 year ago

there are a couple of options of dealing with this:

Let's see what we do in the CLEAN_AND_RESAMPLE code since we know that works

shankari commented 1 year ago

in CLEAN_AND_RESAMPLE, (save_cleaned_segments_for_timeline) we:

what do we do in create_and_link_timeline is (2) - iterate over trips; update the cleaned start place and create a new cleaned end place

I am pretty sure I can get this to work with approach (1) as well, and it even seems cleaner, but given the current timeframe, going with the tried and true here.

So high level pseudocode is:

shankari commented 1 year ago

one challenge with following the approach above is that we have cleaned_untracked objects, but not confirmed_untracked objects

shankari commented 1 year ago

I am not even sure how we can avoid confirmed_untracked objects. What will the starting and confirmed end place of the untracked time link to? It seems cleaner to create a new confirmed_untracked and link it to the confirmed timeline; which will also support labels for the untracked time down the road

Until now the place links were not updated, so it was moot. But having the links not be updated was incorrect.

Let's add confirmed_untracked and fix all the links properly.

shankari commented 1 year ago

Ok I think I am basically done EXCEPT for thinking about the user input matching for the first trip in the timeline for each run of the pipeline. Currently, the user input matching happens in the create_confirmed_entry. But the first place in the timeline was created on the last run (and was the last place then).

Let's see how the place matching works.

shankari commented 1 year ago

ok, this is even worse. I took a trip with the test phone today and of course all the inputs got pushed up to the server, and now they have all disappeared. There is something seriously broken wrt server side trip addition processing. @sebastianbarry have you noticed this as well?

Place still ends at midnight No matching additions No matching additions
Screenshot_20230407-195518 Screenshot_20230407-195603 Screenshot_20230407-195631
shankari commented 1 year ago

investigating briefly tonight

we received a bunch of data from the phone

2023-04-08 01:11:07,619:DEBUG:139892232456000:Returning multi_result.inserted_ids = [ObjectId('6430bcef80ea0c19db61646b'), ObjectId('6430bcef80ea0c19db61646c'), ObjectId('6430bcef80ea0c57f112c3b4'), ObjectId('6430bcef80ea0c57f112c3b5'), ObjectId('6430bcef80ea0c19db61646d'), ObjectId('6430bcef80ea0c19db61646e'), ObjectId('6430bcef80ea0c57f112c3b6'), ObjectId('6430bcef80ea0c57f112c3b7'), ObjectId('6430bcef80ea0c57f112c3b8'), ObjectId('6430bcef80ea0c57f112c3b9')]... of length 292

we tried to process them and got 5 user inputs

2023-04-08 01:11:07,662:DEBUG:139892232456000:finished querying values for ['manual/mode_confirm', 'manual/purpose_confirm', 'manual/replaced_mode', 'manual/trip_user_input', 'manual/trip_addition_input', 'manual/place_addition_input'], count = 5

there was a match for at least the first user input

2023-04-08 01:11:07,981:DEBUG:139892232456000:Comparing user input 1 Voluntary Work, : 2023-04-06T17:46:33.983000-07:00 -> 2023-04-06T17:47:46.695139-07:00, trip 2023-04-06T17:46:33.983000-07:00 -> 2023-04-06T17:47:46.695139-07:00, start checks are (True && True) and end checks are (True || True)

2023-04-08 01:11:07,984:DEBUG:139892232456000:sorted candidates are [{'write_fmt_time': '2023-04-06T18:17:11.788968-07:00', 'detail': '2023-04-06T17:46:33.983000-07:00'}]

2023-04-08 01:11:07,984:DEBUG:139892232456000:most recent entry is 2023-04-06T18:17:11.788968-07:00, 2023-04-06T17:46:33.983000-07:00

2023-04-08 01:11:07,985:DEBUG:139892232456000:Saving entry Entry({'_id': ObjectId('642f6f230713300017460414'), 
'metadata': {'key': 'analysis/confirmed_place', 'write_fmt_time': '2023-04-06T18:17:11.788968-07:00'}, '
data': {'source': 'DwellSegmentationTimeFilter',
'enter_fmt_time': '2023-04-06T17:46:33.983000-07:00',
'exit_fmt_time': '2023-04-06T17:47:46.695139-07:00', 
'additions': [Entry({'_id': ObjectId('6430bcef80ea0c57f112c3c9'),
     'metadata': {'key': 'manual/place_addition_input', 
     ]}}) into timeseries

Similarly, we save

Saving entry Entry({'_id': ObjectId('642f6f230713300017460416'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2023-04-06T17:54:03.967000-07:00'
'exit_fmt_time': '2023-04-06T17:56:44.991424-07:00',
'additions': [Entry({'_id': ObjectId('6430bcef80ea0c57f112c3ca'), 

and

Saving entry Entry({'_id': ObjectId('642f6f230713300017460418'), 
'metadata': {'key': 'analysis/confirmed_place', 
'write_fmt_time': '2023-04-06T18:17:11.792570-07:00'
'enter_fmt_time': '2023-04-06T18:07:08.985000-07:00',
'additions': [Entry({'_id': ObjectId('6430bcef80ea0c57f112c3cb'), 
shankari commented 1 year ago

Ah of course; we copy over the confirmed place into the composite trip, but we don't recreate it when the confirmed place changes with the addition of matches.

edb.get_analysis_timeseries_db().find_one({"_id": boi.ObjectId("642f6f230713300017460416")})
{'_id': ObjectId('642f6f230713300017460416'),
'additions': [{'_id': ObjectId('6430bcef80ea0c57f112c3ca'),

But

>>> edb.get_analysis_timeseries_db().find_one({"data.end_confirmed_place._id": boi.ObjectId("642f6f230713300017460416")})

{'_id': ObjectId('642f6f25071330001746041c'),
'metadata': {'key': 'analysis/composite_trip',
'origin_key': 'analysis/confirmed_trip'
'end_confirmed_place': {'_id': ObjectId('642f6f230713300017460416'),
    'metadata': {'key': 'analysis/confirmed_place', 
    'data': {'enter_fmt_time': '2023-04-06T17:54:03.967000-07:00',
                'exit_fmt_time': '2023-04-06T17:56:44.991424-07:00',
                 duration': 161.02442359924316,
                'cleaned_place': ObjectId('642f6f170713300017460402'),
                'user_input': {}, 'additions': []}}}

So we will need to update the related composite trip as well when we update the confirmed place

shankari commented 1 year ago

wrt

will all the trip addition inputs match that last place on the server as well?

Certainly seems like it from the code.

    if start_checks and not end_checks:
        logging.debug("Handling corner case where start check matches, but end check does not")
        next_entry_obj = _get_next_cleaned_timeline_entry(ts, tl_entry)
        if next_entry_obj is not None:
            next_entry_end = end_of(next_entry_obj)
            if next_entry_end is None: # the last place will not have an exit_ts
                end_checks = True # so we will just skip the end check
            else:

Let's test it, and then finish up that fix

shankari commented 1 year ago

While testing, found another issue. In https://github.com/shankari/e-mission-server/blob/add_trip_place_additions/emission/analysis/plotting/composite_trip_creation.py#L17, we added a hack to "fill in" the confirmed place for confirmed trips.

The hack was arguably incorrect to begin with and is now even more incorrect.

However, there was also a weirdness in confirmed_trip from the previous implementation in that we created confirmed_trips by copying over from the corresponding cleaned_trip, so the start_place and the end_place are filled in, but point to the corresponding cleaned trips instead.

>>> edb.get_analysis_timeseries_db().find_one({"metadata.key": "analysis/confirmed_trip"})
{'_id': ObjectId('64341826ebf368c478d5482e'),
'metadata': {'key': 'analysis/confirmed_trip',
'data': {'start_place': ObjectId('64337da27423ef6528092f28'), 'end_place': ObjectId('64341826ebf368c478d54825'),
'user_input': {}, 'trip_addition': []}}

>>> edb.get_analysis_timeseries_db().find_one({"_id": boi.ObjectId("64341826ebf368c478d54825")})
{'_id': ObjectId('64341826ebf368c478d54825'),
'metadata': {'key': 'analysis/cleaned_place')}}

So we can change the hack to check whether the start place/end place are cleaned places and replace them with the corresponding confirmed place instead.

That will require a DB call to determine whether the place is of the correct type, though which I would like to avoid. Is there a simple check in the format of the confirmed_trip object that we can use instead?

Here's a potential hack:

shankari commented 1 year ago

Testing done:

Screenshot 2023-04-10 at 10 22 30 PM
END 2023-04-10 22:24:27.100300 POST /usercache/put 2f012dd4-7b47-43aa-b38f-3d0c6d6e8f3c 4.096529960632324
>>> last_confirmed_place = esdp.get_last_place_entry("analysis/confirmed_place", test_uuid)
>>> last_confirmed_place
Entry({'_id': ObjectId('643427314fcc9197e202a0ad'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-04T16:38:38.348000-10:00',
'additions': [...]

>>> len(last_confirmed_place['data']['additions'])
5
Screenshot 2023-04-10 at 10 30 56 PM

The original last place now has an exit_ts

{'_id': ObjectId('643427314fcc9197e202a0ad'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-04T16:38:38.348000-10:00',
'exit_fmt_time': '2016-08-04T16:38:38.348000-10:00',
starting_trip: ObjectId('6434f12ba6c4a675cc9eb77f')

But it still has 5 additions

>>> len(orig_last_place['data']['additions'])
5

New last confirmed place does not have exit_ts, does not have additions

>>> last_confirmed_place = esdp.get_last_place_entry("analysis/confirmed_place", test_uuid)
Entry({'_id': ObjectId('6434f12ba6c4a675cc9eb784'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-05T17:25:52.895000-07:00',
'additions': []

The starting place for that original last trip was some untracked time, and it matches one of the inputs. Note that this input has not been removed from the previous last confirmed place.

{'_id': ObjectId('6434f12ba6c4a675cc9eb77f')
'metadata': {'key': 'analysis/confirmed_untracked',
'start_fmt_time': '2016-08-04T16:38:38.348000-10:00'
'end_fmt_time': '2016-08-05T04:53:24.886000-07:00'
additions: [{'metadata': 'key': 'manual/place_addition_input',}]

The end place for the untracked time is

'_id': ObjectId('6434f12ba6c4a675cc9eb780'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2016-08-05T04:53:24.886000-07:00'
'exit_fmt_time': '2016-08-05T05:11:41.493000-07:00'
'duration': 1096.6070001125336
'starting_trip': ObjectId('6434f12ba6c4a675cc9eb781'),
'additions': []

The next trip is below. Seems like it should have matched at least the breakfast "personal care"; probably didn't match because it was a place input.

{'_id': ObjectId('6434f12ba6c4a675cc9eb781'),
'end_fmt_time': '2016-08-05T08:48:09-07:00'
'start_fmt_time': '2016-08-05T05:11:41.493000-07:00'
'end_place': ObjectId('6434f12ba6c4a675cc9eb782'),
'additions': []

The next place also doesn't match any additions because the place addition starts an hour before the place starts.

{'_id': ObjectId('6434f12ba6c4a675cc9eb782'),
'metadata': {'key': 'analysis/confirmed_place'
'enter_fmt_time': '2016-08-05T08:48:09-07:00'
'exit_fmt_time': '2016-08-05T17:21:35.725313-07:00'
'additions': []

Final UI screenshots:

All entries are still here, except that the display time on the last two entries has changed due to the timezone change Duplicate entries in a later place Untracked time overlaps with the next place
Screenshot 2023-04-10 at 11 06 12 PM Screenshot 2023-04-10 at 11 06 34 PM Screenshot 2023-04-10 at 11 22 31 PM

Pending issues/bugs:

Matching-related questions:

shankari commented 1 year ago

Thanks to a suggestion from @JGreenlee, changed the expected key for untracked time to fix the overlap. Also added the enketo-notes-list to the untracked item directive and it automagically worked after changing all the variables passed in

            <enketo-notes-list ng-if="triplike.additionsList.length" timeline-entry="triplike" addition-entries="triplike.additionsList"></enketo-notes-list>
Screenshot 2023-04-11 at 8 46 08 AM

Gives me hope for the more modular future!

shankari commented 1 year ago

Next, we need to fix https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1500799602 Not sure why that didn't show up in the previous reproduction when we reload from the UI, the composite trip should have lost the matching places.

Let's see why that didn't happen.

shankari commented 1 year ago

We have three inputs

>>> all_inputs = list(ts.find_entries(["manual/place_addition_input"]))
>>> len(all_inputs)
3

They are matched to the places

>>> all_places = list(ts.find_entries(["analysis/confirmed_place"]))
>>> pd.json_normalize(all_places)["data.additions"]
0                                                   []
1                                                   []
2                                                   []
3                                                   []
4                                                   []
5                                                   []
6    [{'_id': 64358bf0f5622167bcba53c8, 'user_id': ...
7    [{'_id': 64358bf0f5622167bcba533e, 'user_id': ...
8    [{'_id': 64358bf0f5622167bcba527c, 'user_id': ...
9                                                   []

But don't show up in the confirmed trips

>>> all_ct = list(ts.find_entries(["analysis/composite_trip"]))
>>> pd.json_normalize(all_ct)["data.end_confirmed_place.data.additions"]
0    []
1    []
2    []
3    []
4    []
5    []
6    []
7    []
8    []

But they do still show up on reload in the UI Let's see why

It's because the manual result map has three entries. And that is because we are retrieving them remotely.

[Log] DEBUG:About to dedup localResult = 0remoteResult = 3 (cordova.js, line 1413)
[Log] DEBUG:Deduped list = 3 (cordova.js, line 1413)

Why are we still retrieving them remotely although they have been processed? Ah, I think it is because of the mismatch between the time the data was collected and the time that it was labeled. We collected the data in 2016, so the pipeline range is

{'user_id': UUID('3b5121f7-32b0-41c1-ae63-c7e2fd5e3e43'), '$or': [{'metadata.key': 'manual/place_addition_input'}], 'metadata.write_ts': {'$lte': 1681231955, '$gte': 1470364708.348}}

which is from 2016 to 2023

>>> arrow.get(1470364708.348).to("America/Los_Angeles")
<Arrow [2016-08-04T19:38:28.348000-07:00]>

>>> arrow.get(1681231955).to("America/Los_Angeles")
<Arrow [2023-04-11T09:52:35-07:00]>

In a normal pipeline, the pipeline would move ahead, so we would not keep getting the entries. That is when they will disappear.

shankari commented 1 year ago

To fix https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1500799602, we need to ensure that whenever we update the confirmed_place object, we should also update the corresponding composite trip (f any). This should happen for any update, not just the user input or addtion; when we update the timestamps of the last place, that should also be reflected in the composite trip, for example.

Before running the pipeline a second time

>>> pd.json_normalize(all_ct)["data.end_confirmed_place.data.exit_fmt_time"]
0    2016-08-04T10:41:32.136385-10:00
1    2016-08-04T13:10:38.739684-10:00
2    2016-08-04T13:40:36.959000-10:00
3    2016-08-04T13:46:26.561801-10:00
4    2016-08-04T14:06:52.592000-10:00
5    2016-08-04T14:18:35.840464-10:00
6    2016-08-04T14:39:38.288795-10:00
7    2016-08-04T16:34:45.744782-10:00
8                                 NaN

Load data from the 5th, re-run pipeline

>>> all_places = list(ts.find_entries(["analysis/confirmed_place"]))
>>> pd.json_normalize(all_places)["data.exit_fmt_time"]
0     2016-08-04T10:03:51.235000-10:00
1     2016-08-04T10:41:32.136385-10:00
2     2016-08-04T13:10:38.739684-10:00
3     2016-08-04T13:40:36.959000-10:00
4     2016-08-04T13:46:26.561801-10:00
5     2016-08-04T14:06:52.592000-10:00
6     2016-08-04T14:18:35.840464-10:00
7     2016-08-04T14:39:38.288795-10:00
8     2016-08-04T16:34:45.744782-10:00
9     2016-08-04T16:38:38.348000-10:00
10    2016-08-05T05:11:41.493000-07:00
11    2016-08-05T17:21:35.725313-07:00
12                                 NaN

>>> pd.json_normalize(all_ct)["data.end_confirmed_place.data.exit_fmt_time"]
0     2016-08-04T10:41:32.136385-10:00
1     2016-08-04T13:10:38.739684-10:00
2     2016-08-04T13:40:36.959000-10:00
3     2016-08-04T13:46:26.561801-10:00
4     2016-08-04T14:06:52.592000-10:00
5     2016-08-04T14:18:35.840464-10:00
6     2016-08-04T14:39:38.288795-10:00
7     2016-08-04T16:34:45.744782-10:00
8                                  NaN
9                                  NaN
10    2016-08-05T17:21:35.725313-07:00
11                                 NaN
shankari commented 1 year ago

While writing the automated tests, I ran into the issue that the trips don't seem to show inputs (either user_input or additions).

I first thought that this was a server issue, but everything seemed to be working fine on the server - the confirmed trips were updated, and then the composite trips were updated. But then I noticed that the timestamps on the server were different from the timestamps on the phone, aka I couldn't match up the entries saved on the server with the values displayed on the phone.

So I retrieved the data and added a breakpoint and they are still not matching up!

Here is the set of trips that have matching inputs

Screenshot 2023-04-12 at 4 26 02 PM Screenshot 2023-04-12 at 4 26 24 PM Screenshot 2023-04-12 at 4 26 11 PM

Here's the visualization around that time frame on the phone. Note that none of the timestamps match up! There are no trips that start between 13:46 and 14:18, and in fact, there is a gap in the timeline right there.

shankari commented 1 year ago

Ok so I looked through the retrieved list Very Carefully, and it is clear that trips with inputs are not displayed.

ctList: Array (9)
0 {end_loc: {type: "Point", coordinates: [-155.0397394, 19.6218661]}, source: "DwellSegmentationTimeFilter", start_loc: {type: "Point", coordinates: [-154.9029399, 19.5461465]}, user_input: {}, duration: 1880.7650001049042, …}
1 {end_loc: {type: "Point", coordinates: [-155.9108361, 19.4228153]}, source: "DwellSegmentationTimeFilter", start_loc: {type: "Point", coordinates: [-155.0397394, 19.6218661]}, user_input: {}, duration: 8056.584614753723, …}
2 {end_loc: {type: "Point", coordinates: [-155.9109609, 19.4213896]}, source: "DwellSegmentationTimeFilter", start_loc: {type: "Point", coordinates: [-155.9108361, 19.4228153]}, user_input: {}, duration: 605.260315656662, …}
3 Object <----- displayed
_id: "64373344796eba348c2149dc"
additions: [] (0)
end_fmt_time: "2016-08-04T13:42:52.709000-10:00"
start_fmt_time: "2016-08-04T13:40:36.959000-10:00"
user_input: {}

4 Object <--------- not displayed
_id: "64373344796eba348c2149dd"
additions: [Object] (1)
end_fmt_time: "2016-08-04T13:58:52-10:00"
start_fmt_time: "2016-08-04T13:46:26.561801-10:00"
user_input: {trip_user_input: Object}

Trips before and after the set that I labeled are displayed. This is probably a UI fix...

shankari commented 1 year ago

Doh! This is the "To Label" screen, so trips with labels have moved to "All Trips". Maybe we should change this functionality for ENKETO instead of MULTILABEL.

Screenshot 2023-04-12 at 5 03 51 PM
shankari commented 1 year ago

Unit test results

No inputs All inputs matched to last place
Simulator Screen Shot - iPhone 14 - 2023-04-12 at 14 31 00 Simulator Screen Shot - iPhone 14 - 2023-04-12 at 14 31 11
Spread out inputs 1 Spread out inputs 2
Simulator Screen Shot - iPhone 14 - 2023-04-12 at 14 58 55 Simulator Screen Shot - iPhone 14 - 2023-04-12 at 14 59 05
Trip inputs matching 1 Trip inputs matching 2
simulator_screenshot_FF8AAC92-D31C-4F73-99CC-110F7EA9EE97 simulator_screenshot_5C3B794F-95E1-445A-B08B-9140CF9ACB25
shankari commented 1 year ago

Final test is for the hack to fill in the confirmed places.

So the checks are:

shankari commented 1 year ago

Checked out to the first commit and ran the pipeline

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
0
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
0

Checked to the most recent and re-ran pipeline

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
18
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

Why do we have 18 places? We should have 10

Ah, its because we create start and end places for each trip. But the end place of one trip is the start place of the next. We should handle that properly in the hack

shankari commented 1 year ago

Fixed by checking to see if there was a related confirmed trip before creating one

Now we have

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
10
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

And the output matches our test case (which runs only the new code)

>>> with open(dataFile_1+".before-user-inputs.expected_composite_trips") as expectation:
...     expected_trips = json.load(expectation, object_hook = bju.object_hook)
...     print(len(composite_trips), len(expected_trips))
...     for i in range(len(composite_trips)):
...             print(composite_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"], expected_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"])
...
9 9
2016-08-04T10:35:12-10:00 2016-08-04T10:35:12-10:00
2016-08-04T12:55:48.721000-10:00 2016-08-04T12:55:48.721000-10:00
2016-08-04T13:20:44-10:00 2016-08-04T13:20:44-10:00
2016-08-04T13:42:52.709000-10:00 2016-08-04T13:42:52.709000-10:00
2016-08-04T13:58:52-10:00 2016-08-04T13:58:52-10:00
2016-08-04T14:12:04.251000-10:00 2016-08-04T14:12:04.251000-10:00
2016-08-04T14:34:12.571000-10:00 2016-08-04T14:34:12.571000-10:00
2016-08-04T16:18:54.709000-10:00 2016-08-04T16:18:54.709000-10:00
2016-08-04T16:38:38.348000-10:00 2016-08-04T16:38:38.348000-10:00
shankari commented 1 year ago

Checked out the second commit and ran the pipeline

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

re-running ends up with the same values

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9
shankari commented 1 year ago

That's because the entries have addition instead of trip_addition Although I checked, and the production servers do have trip_addition Let's double check the commit by launching a container with the image

It is

$ docker run -it shankari/e-mission-server:trip_place_additions_master_2023_03_10_1 /bin/bash
# git log
commit f64f5be9d31acdd0cb5454eb08057b6b8b7c6a6b (HEAD -> add_trip_place_additions_new_master)
Merge: c806bcb4 1a374276
Author: Shankari <shankari@eecs.berkeley.edu>
Date:   Fri Mar 10 20:37:00 2023 -0800

    Merge branch 'master' of https://github.com/e-mission/e-mission-server into add_trip_place_additions_new_master

commit 1a374276d35acc1e07ad50915cc7cb670afb65a3 (upstream/master)
Merge: 5b839e11 4492edff
Author: shankari <shankari@eecs.berkeley.edu>
Date:   Fri Mar 10 20:35:26 2023 -0800

    Merge pull request #902 from swastis10/server_upgrade

    Configuring PYTHON_LEGACY UUID representation in cfc_webapp.py

...

commit 9aed65d5948e61ba284e5a3ee29b4f9bc2b81290 (origin/add_trip_place_additions, add_trip_place_additions)
Author: Shankari <shankari@eecs.berkeley.edu>
Date:   Thu Mar 9 18:08:29 2023 -0800

    :bug: Read the match ID from 'data' instead of directly from the entry

    To be consistent with what the phone is actually sending
    + also fix all the test cases

    Testing done:
$ ./e-mission-py.bash emission//tests/analysisTests/userInputTests/TestUserInputFakeData.py
----------------------------------------------------------------------
Ran 6 tests in 0.196s

OK
```

This fixes
https://github.com/e-mission/e-mission-docs/issues/861
shankari commented 1 year ago

Ok so using commit 9aed65d5948e61ba284e5a3ee29b4f9bc2b81290 instead, we get

>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_trip", "user_id": self.testUUID})
9
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
0
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
0
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/confirmed_place", "user_id": self.testUUID})
10
>>> edb.get_analysis_timeseries_db().count_documents({"metadata.key": "analysis/composite_trip", "user_id": self.testUUID})
9

And the resulting values are consistent

>>> with open(dataFile_1+".before-user-inputs.expected_composite_trips") as expectation:
...     expected_trips = json.load(expectation, object_hook = bju.object_hook)
...     print(len(composite_trips), len(expected_trips))
...     for i in range(len(composite_trips)):
...             print(composite_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"], expected_trips[i]["data"]["end_confirmed_place"]["data"]["enter_fmt_time"])
...
9 9
2016-08-04T10:35:12-10:00 2016-08-04T10:35:12-10:00
2016-08-04T12:55:48.721000-10:00 2016-08-04T12:55:48.721000-10:00
2016-08-04T13:20:44-10:00 2016-08-04T13:20:44-10:00
2016-08-04T13:42:52.709000-10:00 2016-08-04T13:42:52.709000-10:00
2016-08-04T13:58:52-10:00 2016-08-04T13:58:52-10:00
2016-08-04T14:12:04.251000-10:00 2016-08-04T14:12:04.251000-10:00
2016-08-04T14:34:12.571000-10:00 2016-08-04T14:34:12.571000-10:00
2016-08-04T16:18:54.709000-10:00 2016-08-04T16:18:54.709000-10:00
2016-08-04T16:38:38.348000-10:00 2016-08-04T16:38:38.348000-10:00

It's a wrap!

shankari commented 1 year ago

Checked display on both android and iOS

Screenshot 2023-04-13 at 9 40 40 AM
shankari commented 1 year ago

Created child issues:

shankari commented 1 year ago

Closed by: https://github.com/shankari/e-mission-server/pull/13 and https://github.com/e-mission/e-mission-phone/pull/950

shankari commented 1 year ago

I spent all day filling in time entries and they all disappeared when I took a trip in the evening.

Missing additions Place additions work Trip additions work
Screenshot_20230414-222255 Screenshot_20230414-222223 Screenshot_20230414-222037
shankari commented 1 year ago

This is in the last 5 trips.

>>> last_5_list = list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/composite_trip"}).sort("data.start_ts", -1).limit(5))
>>> import pandas as pd
>>> last_5_df = pd.json_normalize(last_5_list)
>>> last_5_df["data.start_fmt_time"]
0    2023-04-14T20:16:35.992728-07:00
1    2023-04-14T19:47:31.480693-07:00
2    2023-04-14T18:48:22.422325-07:00
3    2023-04-13T20:10:58.404540-07:00
4    2023-04-13T19:31:59.983000-07:00

And it does not have any additions

>>> last_5_df.iloc[3]["_id"]
ObjectId('6438d316e2fd7ac823955632')
>>> last_5_df.iloc[3]["data.start_fmt_time"]
'2023-04-13T20:10:58.404540-07:00'
>>> last_5_df.iloc[3]["data.end_fmt_time"]
'2023-04-13T20:22:52.978000-07:00'
>>> last_5_df.iloc[3]["data.end_confirmed_place.data.additions"]
[]

The related confirmed place object is as below and it doesn't have additions either. why didn't it the additions match?

>>> edb.get_analysis_timeseries_db().find_one({"_id": cpeid})
{'_id': ObjectId('6438d313e2fd7ac82395562b'),
 'metadata': {'key': 'analysis/confirmed_place',
'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'},
'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00',
'enter_ts': 1681442572.978
'user_input': {}, 
'additions': [],
'exit_fmt_time': '2023-04-14T18:48:22.422325-07:00',
'exit_ts': 1681523302.4223254,
}}

Hm. there are three entries with the same enter_ts

>>> edb.get_analysis_timeseries_db().count_documents({"data.enter_ts": 1681442572.978})
3

And they are the places in the three timelines

>>> edb.get_analysis_timeseries_db().find({"data.enter_ts": 1681442572.978}).distinct("metadata.key")
['analysis/cleaned_place', 'analysis/confirmed_place', 'segmentation/raw_place']
shankari commented 1 year ago

We do find the last confirmed place and set it into the composite trip, but it has zero additions

2023-04-14T21:14:24.689-07:00 | 2023-04-15  04:14:24,689:DEBUG:140635200833344:last confirmed_place  6438d313e2fd7ac82395562b was already in database, updating with linked  trip info... and 0 additions

looking further upstream...

2023-04-15 04:14:24,338:DEBUG:140635200833344:Found existing last confirmed place, setting exit information to 2023-04-14T18:48:22.422325-07:00, and trimming additions to 0

We need to find the match incoming step for it. It looks like we do save an entry with additions earlier

2023-04-15 04:12:28,649:DEBUG:140635200833344:Saving entry Entry({'_id': ObjectId('6438d313e2fd7ac82395562b'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'analysis/confirmed_place', 'platform': 'server', 'write_ts': 1681445651.4299612, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 14, 'second': 11, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'enter_ts': 1681442572.978, 'enter_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'location': {'type': 'Point', 'coordinates': [-122.0863953, 37.391031]}, 'raw_places': [ObjectId('6438d2efe2fd7ac823955574')], 'ending_trip': ObjectId('6438d313e2fd7ac82395562a'), 'cleaned_place': ObjectId('6438d30be2fd7ac823955607'), 'user_input': {}, 'additions': [Entry({'_id': ObjectId('643a1a4080ea0c46334769f1'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'manual/place_addition_input', 'platform': 'android', 'read_ts': 0, 'time_zone': 'America/Los_Angeles', 'type': 'message', 'write_ts': 1681446697.777, 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 31, 'second': 37, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:31:37.777000-07:00'}, 'data': {'label': '1 Domestic, ', 'name': 'TimeUseSurvey', 'version': 9, 'xmlResponse': '<a88RxBtE3jwSar3cwiZTdn xmlns:jr="http://openrosa.org/javarosa" xmlns:orx="http://openrosa.org/xforms" id="a88RxBtE3jwSar3cwiZTdn">\n          <start>2023-04-13T21:31:04.570-07:00</start>\n          <end>2023-04-13T21:31:04.570-07:00</end>\n          <group_hg4zz25>\n            <Date>2023-04-13</Date>\n            <Start_time>20:22:52.978-07:00</Start_time>\n            <End_time>21:00:00.000-07:00</End_time>\n            <Activity_Type>domestic_activities</Activity_Type>\n            <Personal_Care_activities/>\n            <Employment_related_a_Education_activities/>\n            <Domestic_activities>preparing_meals_or_snacks</Domestic_activities>\n            <Recreation_and_leisure/>\n            <Voluntary_work_and_care_activities/>\n            <Other/>\n          </group_hg4zz25>\n          <meta>\n            <instanceID>uuid:2c4df962-617d-424a-839f-75f4f3147226</instanceID>\n          </meta>\n        </a88RxBtE3jwSar3cwiZTdn>', 'jsonDocResponse': {'a88RxBtE3jwSar3cwiZTdn': {'attr': {'xmlns:jr': 'http://openrosa.org/javarosa', 'xmlns:orx': 'http://openrosa.org/xforms', 'id': 'a88RxBtE3jwSar3cwiZTdn'}, 'start': '2023-04-13T21:31:04.570-07:00', 'end': '2023-04-13T21:31:04.570-07:00', 'group_hg4zz25': {'attr': {}, 'Date': '2023-04-13', 'Start_time': '20:22:52.978-07:00', 'End_time': '21:00:00.000-07:00', 'Activity_Type': 'domestic_activities', 'Personal_Care_activities': '', 'Employment_related_a_Education_activities': '', 'Domestic_activities': 'preparing_meals_or_snacks', 'Recreation_and_leisure': '', 'Voluntary_work_and_care_activities': '', 'Other': ''}, 'meta': {'attr': {}, 'instanceID': 'uuid:2c4df962-617d-424a-839f-75f4f3147226'}}}, 'start_ts': 1681442572.978, 'end_ts': 1681444800, 'match_id': '30427f9d-8faf-4448-bdbb-50378daaf644', 'start_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'end_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 0, 'second': 0, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'end_fmt_time': '2023-04-13T21:00:00-07:00'}})]}}) into timeseries

After the user input matching, we end up with

2023-04-15 04:12:34,053:DEBUG:140635200833344:Saving entry Entry({'_id': ObjectId('6438d313e2fd7ac82395562b'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'analysis/confirmed_place', 'platform': 'server', 'write_ts': 1681445651.4299612, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 14, 'second': 11, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'enter_ts': 1681442572.978, 'enter_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'location': {'type': 'Point', 'coordinates': [-122.0863953, 37.391031]}, 'raw_places': [ObjectId('6438d2efe2fd7ac823955574')], 'ending_trip': ObjectId('6438d313e2fd7ac82395562a'), 'cleaned_place': ObjectId('6438d30be2fd7ac823955607'), 'user_input': {}, 'additions': [{}, {}, {}, {}, {}, {}, {}, {}, {}, Entry()]}}) into timeseries

So why do all those additions get deleted later?

shankari commented 1 year ago

In CREATE_CONFIRMED_OBJECTS, when we read the last place doc, it has all the additions, so it is not our read-after-write inconsistency

2023-04-15 04:14:24,088:DEBUG:140635200833344:last place doc = {'_id': ObjectId('6438d313e2fd7ac82395562b'), 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'metadata': {'key': 'analysis/confirmed_place', 'platform': 'server', 'write_ts': 1681445651.4299612, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 14, 'second': 11, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-13T21:14:11.429961-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'enter_ts': 1681442572.978, 'enter_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'enter_fmt_time': '2023-04-13T20:22:52.978000-07:00', 'location': {'type': 'Point', 'coordinates': [-122.0863953, 37.391031]}, 'raw_places': [ObjectId('6438d2efe2fd7ac823955574')], 'ending_trip': ObjectId('6438d313e2fd7ac82395562a'), 'cleaned_place': ObjectId('6438d30be2fd7ac823955607'), 'user_input': {}, 'additions': [, , , , , , , , , , , , , , ]}}

We then try to find matches but there are none, so we trim the additions to zero.

2023-04-15 04:14:24,310:DEBUG:140635200833344:curr_query = {'user_id': UUID(...'), '$or': [{'metadata.key': 'manual/trip_addition_input'}, {'metadata.key': 'manual/place_addition_input'}], 'data.enter_ts': {'$lte': 1681523302.4223254, '$gte': 1681442572.978}}, sort_key = data.enter_ts
2023-04-15 04:14:24,326:DEBUG:140635200833344:finished querying values for ['manual/trip_addition_input', 'manual/place_addition_input'], count = 0
2023-04-15 04:14:24,327:DEBUG:140635200833344:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
2023-04-15 04:14:24,337:DEBUG:140635200833344:in get_not_deleted_candidates, no candidates, returning []
2023-04-15 04:14:24,338:DEBUG:140635200833344:Found existing last confirmed place, setting exit information to 2023-04-14T18:48:22.422325-07:00, and trimming additions to 0

Why are there no matches? What do the place additions look like?

{'_id': ObjectId('643a1a4080ea0c46334769f1'), 'user_id': UUID('...'),
'metadata': {'key': 'manual/place_addition_input',
'data': 'meta': {'attr': {}, 'instanceID': 'uuid:2c4df962-617d-424a-839f-75f4f3147226'}}},
'start_ts': 1681442572.978, 'end_ts': 1681444800,
'match_id': '30427f9d-8faf-4448-bdbb-50378daaf644',
'start_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 20, 'minute': 22, 'second': 52, 'weekday': 3, 'timezone': 'America/Los_Angeles'},
'start_fmt_time': '2023-04-13T20:22:52.978000-07:00',
'end_local_dt': {'year': 2023, 'month': 4, 'day': 13, 'hour': 21, 'minute': 0, 'second': 0, 'weekday': 3, 'timezone': 'America/Los_Angeles'},
'end_fmt_time': '2023-04-13T21:00:00-07:00'}}

It has start_ts and end_ts, not enter_ts and exit_ts.

The composite place creation code calls get_additions_for_timeline_entry_object which in turn calls

def get_time_query_for_timeline_entry(timeline_entry):
    begin_of_entry = begin_of(timeline_entry)
    end_of_entry = end_of(timeline_entry)
    timeType = "data.start_ts" if "start_ts" in timeline_entry.data else "data.enter_ts"
    if end_of_entry is None:
        # the last place (user's current place) will not have an exit_ts, so
        # every input from its enter_ts onward is fair game
        end_of_entry = EPOCH_MAXIMUM
    return estt.TimeQuery(timeType, begin_of_entry, end_of_entry)

so since this is matching to a place, we search for enter and exit matches while the phone uses start/end for all additions.

So how did this ever work (e.g. in https://github.com/e-mission/e-mission-docs/issues/880#issuecomment-1507272460)? Should re-run the test case and see if we can figure it out. It was a zero duration place anyway, so maybe we weren't expecting it to match a lot

Tomorrow:

shankari commented 1 year ago

Ok I think I understood what happened.

get_additions_for_timeline_entry_object is only called when we have a confirmed object and need to match additions to it. So it is typically not called when we enter values after the confirmed trip/place has already been created. The only times it is called are:

We didn't find this before since:

Basically, the inputs always have start/end ts, and don't match the timeline object, so we should remove the timeline object specific functionality and convert everything to just search for start_ts and end_ts

@JGreenlee @MaliheTabasi for visibility

shankari commented 1 year ago

Confirmed that the addition inputs always have only the start_ts filled in and never the enter_ts

>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/trip_addition_input", "data.start_ts": {"$exists": True}})
111
>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/trip_addition_input", "data.enter_ts": {"$exists": True}})
0
>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/place_addition_input", "data.start_ts": {"$exists": True}})
90
>>> edb.get_timeseries_db().count_documents({"metadata.key": "manual/place_addition_input", "data.enter_ts": {"$exists": True}})
0
shankari commented 1 year ago

Reproducing the issue, we have all the inputs match the final place, but they are all deleted after we get more inputs and re-run the pipeline

All the matches with the last place All entries gone
Simulator Screen Shot - iPhone 14 Pro - 2023-04-16 at 09 03 21 Simulator Screen Shot - iPhone 14 Pro - 2023-04-16 at 09 05 12
shankari commented 1 year ago

Couple of quick things to investigate before going on:

There should be matches for the trip (18:45 to 20:22)

             metadata.write_fmt_time  ... data.jsonDocResponse.a88RxBtE3jwSar3cwiZTdn.group_hg4zz25.Start_time
25  2023-04-13T15:18:44.595000-07:00  ...                                 15:29:17.206-07:00
24  2023-04-13T15:19:11.299000-07:00  ...                                 16:30:00.000-07:00
23  2023-04-13T15:21:20.298000-07:00  ...                                 17:30:00.000-07:00
22  2023-04-13T15:21:41.249000-07:00  ...                                 18:29:00.000-07:00
21  2023-04-13T15:22:28.623000-07:00  ...                                 19:29:00.000-07:00
20  2023-04-13T18:11:42.594000-07:00  ...                                                NaN
19  2023-04-13T18:12:24.638000-07:00  ...                                 18:03:29.356-07:00
18  2023-04-13T18:13:10.923000-07:00  ...                                 18:36:59.350-07:00
17  2023-04-13T18:13:49.804000-07:00  ...                                 17:14:04.360-07:00
16  2023-04-13T18:14:06.490000-07:00  ...                                 18:00:29.356-07:00
15  2023-04-13T18:14:32.465000-07:00  ...                                                NaN
14  2023-04-13T18:15:21.962000-07:00  ...                                                NaN
13  2023-04-13T18:15:39.699000-07:00  ...                                                NaN
12  2023-04-13T21:25:48.619000-07:00  ...                                                NaN
11  2023-04-13T21:26:01.901000-07:00  ...                                 18:45:43.402-07:00
10  2023-04-13T21:26:24.069000-07:00  ...                                                NaN
9   2023-04-13T21:26:42.917000-07:00  ...                                 18:58:47.497-07:00
8   2023-04-13T21:27:53.347000-07:00  ...                                 19:37:31.972-07:00
7   2023-04-13T21:28:22.735000-07:00  ...                                                NaN
6   2023-04-13T21:29:39.628000-07:00  ...                                                NaN
5   2023-04-13T21:30:03.779000-07:00  ...                                 19:21:58.659-07:00
4   2023-04-13T21:30:30.951000-07:00  ...                                 19:28:59.983-07:00
3   2023-04-13T21:30:50.064000-07:00  ...                                                NaN
2   2023-04-13T21:31:02.171000-07:00  ...                                 20:10:58.404-07:00
1   2023-04-13T21:31:37.777000-07:00  ...                                 20:22:52.978-07:00
0   2023-04-13T21:32:00.367000-07:00  ...                                 21:00:00.000-07:00
shankari commented 1 year ago

are the earlier entries getting matched properly (e.g. trip details and the non-last-place time use)?

Yes they are. I just needed to load the entries properly in the test case: load entries; setup, load entries, setup. Not load, load, setup