asiripanich / emdash

An e-mission deployer's dashboard. See https://github.com/e-mission/e-mission-docs.
https://emdash.amarin.dev
Other
6 stars 3 forks source link

Determine which trips have associated user input #37

Open shankari opened 3 years ago

shankari commented 3 years ago

Right now, there are pipeline stages that combine the user input with the cleaned trip objects to create confirmed trips. A sample confirmed trip object from the master/ceo_ebike_program branch looks like

{'_id': ObjectId('606b8bf0c77a1ff9e630f422'),
 'user_id': UUID('d4376620-fbcd-4aab-95bf-8c2e0ecf9adf'),
 'metadata': {'key': 'analysis/confirmed_trip',
 'platform': 'server',
 'write_ts': 1617660912.6729634,
 'time_zone': 'America/Los_Angeles',
 'write_local_dt': {'year': 2021, 'month': 4, 'day': 5, 'hour': 15, 'minute': 15, 'second': 12, 'weekday': 0, 'timezone': 'America/Los_Angeles'},
 'write_fmt_time': '2021-04-05T15:15:12.672963-07:00'},
 'data': {'source': 'DwellSegmentationTimeFilter',
 'end_ts': 1617659216.0,
 'end_local_dt': {'year': 2021, 'month': 4, 'day': 5, 'hour': 14, 'minute': 46, 'second': 56, 'weekday': 0, 'timezone': 'America/Los_Angeles'},
 'end_fmt_time': '2021-04-05T14:46:56-07:00',
 'end_loc': {'type': 'Point', 'coordinates': [-122.0867274, 37.3911479]},
 'raw_trip': ObjectId('606b8bf0c77a1ff9e630f3e7'),
 'start_ts': 1617658219.0,
 'start_local_dt': {'year': 2021, 'month': 4, 'day': 5, 'hour': 14, 'minute': 30, 'second': 19, 'weekday': 0, 'timezone': 'America/Los_Angeles'},
 'start_fmt_time': '2021-04-05T14:30:19-07:00',
 'start_loc': {'type': 'Point', 'coordinates': [-122.0870928, 37.390054]},
 'duration': 997.0,
 'distance': 2458.7832149780197,
 'start_place': ObjectId('606b8bf0c77a1ff9e630f41a'),
 'end_place': ObjectId('606b8bf0c77a1ff9e630f41b'),
 'cleaned_trip': ObjectId('606b8bf0c77a1ff9e630f3f1'),
 'user_input': {'mode_confirm': 'bike', 'purpose_confirm': 'pick_drop', 'replaced_mode': 'drove_alone'}}}

After https://github.com/asiripanich/emdash/pull/23, which includes setnames(gsub("user_input", "", names(.))) %>%, the mode_confirm, purpose_confirm and replaced_mode entries automagically show up in the trip table.

But now I want to have a column in the participant table with the number of unlabeled trips. For now, I have implemented (in my fork alone)

.[is.na(mode_confirm), .(unconfirmed = .N), by = user_id]

https://github.com/shankari/emdash/commit/735a93b12a325e491ede3eaba63adaecd59031d7#diff-4e87ef70cdab2756b9d4aa419fd97ffea60f2ec8d9592573ffee2e4ab33dcf53R162-R165

But not everybody is going to use the mode_confirm object, in particular, the travel survey folks will want to potentially have a more complex object.

In python, I do ct_df_confirmed = ct_df[ct_df.user_input != {}], which does not use any field names, but I don't know how to implement a similar check in R.

@asiripanich you asked me to file an issue with the details, and I did 😄