Closed deepalics0044 closed 5 years ago
@deepalics0044 couple of notes
The aggregate timeseries still returns the uuid along with each entry. It's just that get_data_df
doesn't map the uuid to a column. So if you use the entries instead, you should be able to see the uuid as well (e.g. something like)
mc_all = esta.TimeSeries.get_aggregate_time_series().find_entries("manual/mode_confirm")
[(e["user_id"], e["data"]["start_fmt_time"], e["data"]["end_fmt_time"], e["data"]["label"]) for e in mc_all]
But presumably what you really want to do is to combine a particular user's location, analysed trips, and mode confirmation. As you can see from https://github.com/e-mission/e-mission-server/blob/master/Timeseries_Sample.ipynb you can get the timeseries for a particular user and then retrieve dataframes for each of the keys that you want.
I am able to see uuid simply by adding
esta.TimeSeries.get_aggregate_time_series().get_data_df("manual/purpose_confirm")[["_id",
"start_fmt_time", "end_fmt_time", "label"]]
But presumably what you really want to do is to combine a particular user's location, analysed trips, and mode confirmation.
But yes you're right I want to combine mode and purpose with location and time.
For some reasons I am not able to access the link. Is it the same Timeseries_Sample.ipynb we have in e-mission-server root directory?
I am able to see uuid simply by adding
That is _id
aka objectid, not user_id
. You will see that the values are different even if you just look at the confirmations from @ipsita0012
For some reasons I am not able to access the link. Is it the same Timeseries_Sample.ipynb we have in e-mission-server root directory?
Yes. As you can see from the path, it is a link to the file in the root of the e-mission-server repository on github.
That is _id aka objectid, not user_id. You will see that the values are different even if you just look at the confirmations from @ipsita0012
I see the values were different changed the columns to _id to user_id
@deepalics0044 I thought we didn't put the user_id
into the dataframe, but you're right that we do!
Please close this issue if there is nothing left to do.
Actually I am proactively closing the issue because otherwise issues just linger on forever.
But presumably what you really want to do is to combine a particular user's location, analysed trips, and mode confirmation. As you can see from https://github.com/e-mission/e-mission-server/blob/master/Timeseries_Sample.ipynb you can get the timeseries for a particular user and then retrieve dataframes for each of the keys that you want.
Can't really see mode as dataframe
ct_df.columns
Index(['_id', 'distance', 'duration', 'end_fmt_time', 'end_loc',
'end_local_dt_day', 'end_local_dt_hour', 'end_local_dt_minute',
'end_local_dt_month', 'end_local_dt_second', 'end_local_dt_timezone',
'end_local_dt_weekday', 'end_local_dt_year', 'end_place', 'end_ts',
'metadata_write_ts', 'raw_trip', 'source', 'start_fmt_time',
'start_loc', 'start_local_dt_day', 'start_local_dt_hour',
'start_local_dt_minute', 'start_local_dt_month',
'start_local_dt_second', 'start_local_dt_timezone',
'start_local_dt_weekday', 'start_local_dt_year', 'start_place',
'start_ts', 'user_id'],
dtype='object')
mode_confirm
is not stored in a trip (analysis/cleaned_trip
), it is a separate object (manual/mode_confirm
). You would get a separate dataframe for it by retrieving objects with that key.
The documentation on manual objects (which can be found by searching for mode_confirm
in the docs) has additional details on how to match the confirmation to cleaned trip-like objects.
https://github.com/e-mission/e-mission-docs/blob/master/docs/e-mission-both/supporting_user_inputs.md
Is'nt it possible to put the data frame 'label' from mode and purpose object in cleaned_trip data frame.
The cleaned_trip data frame columns:-
Index(['_id', 'distance', 'duration', 'end_fmt_time', 'end_loc',
'end_local_dt_day', 'end_local_dt_hour', 'end_local_dt_minute',
'end_local_dt_month', 'end_local_dt_second', 'end_local_dt_timezone',
'end_local_dt_weekday', 'end_local_dt_year', 'end_place', 'end_ts',
'metadata_write_ts', 'raw_trip', 'source', 'start_fmt_time',
'start_loc', 'start_local_dt_day', 'start_local_dt_hour',
'start_local_dt_minute', 'start_local_dt_month',
'start_local_dt_second', 'start_local_dt_timezone',
'start_local_dt_weekday', 'start_local_dt_year', 'start_place',
'start_ts', 'user_id'],
dtype='object')
The mode data frame columns:-
Index(['_id', 'end_fmt_time', 'end_local_dt_day', 'end_local_dt_hour',
'end_local_dt_minute', 'end_local_dt_month', 'end_local_dt_second',
'end_local_dt_timezone', 'end_local_dt_weekday', 'end_local_dt_year',
'end_ts', 'label', 'metadata_write_ts', 'start_fmt_time',
'start_local_dt_day', 'start_local_dt_hour', 'start_local_dt_minute',
'start_local_dt_month', 'start_local_dt_second',
'start_local_dt_timezone', 'start_local_dt_weekday',
'start_local_dt_year', 'start_ts', 'user_id'],
dtype='object')
I want to see start_loc | end_loc | start_fmt_time | end_fmt_time | label(mode) | label(purpose) together?
@deepalics0044 that is a pandas question. Feel free to look at the pandas documentation on how to merge two dataframes.
Please note that you cannot just merge the dataframes naively because there may not be a 1:1 correspondence between trip and mode because the user may not have confirmed every trip. Or they may have confirmed the mode and not the purpose. Or they may have confirmed the mode twice. That's why I recommend using the pre-written function get_user_input_for_trip_object
Feel free to look at the pandas documentation on how to merge two dataframes.
I went through some of the pandas codes. Looks like we can merge data frames. Will outer join help?
@deepalics0044 I guess, if you do it right. Have you tried it? It's not like I know the answer to this question and I am making you figure it out as part of a class. I have not merged these two dataframes before; if I did, it would be in the code or the documentation.
Try it out and once you figure it out, contribute it here in case others want to re-use it. Think of it as writing a stackoverflow answer :)
If you have tried a bunch of things and nothing works, you can put in what you tried and why it didn't work and I might be able to give you some pointers.
One thing I tried doing is
frames1=pd.merge(ct_df, ct_dfm,on="start_fmt_time", how="inner")
frames1[["start_loc","start_fmt_time","end_loc","label"]]
Though I see correct start_fmt_time BUT I don't get accurate results because there are overall 32 entries made for mode and I see only 6.
| start_loc | start_fmt_time | end_loc | label
-- | -- | -- | -- | --
{'type': 'Point', 'coordinates': [77.6264306, ... | 2018-08-15T09:00:27+05:30 | {'type': 'Point', 'coordinates': [77.5634483, ... | taxi
{'type': 'Point', 'coordinates': [77.6264306, ... | 2018-08-15T09:00:27+05:30 | {'type': 'Point', 'coordinates': [77.5634483, ... | taxi
{'type': 'Point', 'coordinates': [77.5683932, ... | 2018-08-15T20:42:14.393609+05:30 | {'type': 'Point', 'coordinates': [77.5714665, ... | bike
{'type': 'Point', 'coordinates': [77.5688895, ... | 2018-10-30T21:11:20.081000+05:30 | {'type': 'Point', 'coordinates': [77.5687387, ... | Namma Metro
{'type': 'Point', 'coordinates': [77.569044, 1... | 2018-10-31T11:28:44+05:30 | {'type': 'Point', 'coordinates': [77.5640412, ... | taxi
{'type': 'Point', 'coordinates': [77.7084547, ... | 2018-10-31T13:30:57.896000+05:30 | {'type': 'Point', 'coordinates': [77.7021772, ... | Flight
{'type': 'Point', 'coordinates': [77.5641222, ... | 2018-12-22T18:06:32+05:30 | {'type': 'Point', 'coordinates': [77.563883, 1... | walk
Also, getting key error for end_fmt_time.Maybe because the join is inner
frames1[["start_loc","start_fmt_time","end_loc","end_fmt_time","label"]]
KeyError: "['end_fmt_time'] not in index"
The best possible result till now I got is using
frames1=pd.merge(ct_df, ct_dfm,on="start_fmt_time",how="right")
frames1[["start_loc","start_fmt_time","end_loc","label"]]
But it is also not accurate.
@deepalics0044 I already said
Please note that you cannot just merge the dataframes naively because there may not be a 1:1 correspondence between trip and mode because the user may not have confirmed every trip. Or they may have confirmed the mode and not the purpose. Or they may have confirmed the mode twice. That's why I recommend using the pre-written function get_user_input_for_trip_object
you cannot just try the naive merges. they will not work. you have to use get_user_input_for_trip_object
.
If you must use pandas, I would recommend setting a column to the result of apply.
similar to this
https://github.com/e-mission/e-mission-server/blob/3a5e2c921f41ea1bbeaec0d49f4fc722d418794d/bin/analysis/get_app_analytics.py#L28
but with get_user_input_for_trip_object
as the function that you are applying
Alternatively, if you are not familiar with pandas, you can use find_entries
instead of get_data_df
, get a list and iterate through the list of trips, finding the corresponding mode_confirm
for each using get_user_input_for_trip_object
@deepalics0044 were you able to resolve this? Did you use find_entries
or DataFrame.apply
?
If you could document your solution here, it would help other users with the same question.
As I want to get the data in columns, by using pandas I still have some results( studying it) but using pre written functions is more of a challenge for me.
wrt:-
Alternatively, if you are not familiar with pandas, you can use find_entries instead of get_data_df, get a list and iterate through the list of trips, finding the corresponding mode_confirm for each using get_user_input_for_trip_object
Are these the changes need to be applied -
entry_it = ts.find_entries(["analysis/cleaned_trip"], time_query=None)
for ct in entry_it:
cte = ecwe.Entry(ct)
print("=== Trip:", cte.data.start_loc, "->", cte.data.end_loc) // Is this each trip?
user_label = esdt.get_user_input_for_trip_object("manual/mode_confirm", test_user_id, cte.get_id())
section_it = esdt.get_sections_for_trip("analysis/cleaned_section", test_user_id, cte.get_id())
// section corresponding to each trip? If yes the code written above makes sense?
for sec in section_it:
print(" --- Section:", sec.data.start_loc, "->", sec.data.end_loc, " on ", sec.data.sensed_mode)
Largely, yes, but code review is not a substitute for testing. My only feedback is that, for performance reasons, since you already have the trip object, you can use get_user_input_for_trip_object
(which the implementation of get_user_input_for_trip
defers to) to avoid an additional lookup.
Did you have any errors when you ran it? Here's no harm in running code that retrieves data; it is not going to modify it in anyway.
@deepalics0044 I am going to close this tonight unless you have something specific you would like to ask
closing
for ct in entry_it:
cte = ecwe.Entry(ct)
print("=== Trip:", cte.data.start_loc, "->", cte.data.end_loc)
user_label = esdt.get_user_input_for_trip_object("manual/mode_confirm", test_user_id, cte.get_id())
section_it = esdt.get_sections_for_trip("analysis/cleaned_section", test_user_id, cte.get_id())
for sec in section_it:
print(" --- Section:", sec.data.start_loc, "->", sec.data.end_loc, " on ", sec.data.sensed_mode)
The error I get:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-581-5f4a698ed5ac> in <module>()
9
10
---> 11 user_label = esdt.get_user_input_for_trip("manual/mode_confirm", test_user_id, cte.get_id())
12
13
TypeError: get_user_input_for_trip() missing 1 required positional argument: 'user_input_key'
@deepalics0044
get_user_input_for_trip
requires 4 arguments and you are passing in only 3. I did say that code review is not a substitute for testing :)
concretely, you want to use something like get_user_input_for_trip("analysis/cleaned_trip", test_user_id, cte.get_id(), "manual/mode_confirm")
Not able to see all mode label using the above function. I have altogether 14 mode label and can see only 5-6 using the function
=== Trip: {"coordinates": [77.5687779, 13.0146252], "type": "Point"} -> {"coordinates": [77.5633922, 13.030077], "type": "Point"}
Entry({'_id': ObjectId('5d038018317394de3a565127'), 'user_id': UUID('79d9df48-6b44-4a5b-8333-1211b33aedc8'), 'metadata': {'key': 'manual/mode_confirm', 'platform': 'android', 'read_ts': 0, 'time_zone': 'Asia/Kolkata', 'type': 'message', 'write_ts': 1560493331.145, 'write_local_dt': {'year': 2019, 'month': 6, 'day': 14, 'hour': 11, 'minute': 52, 'second': 11, 'weekday': 4, 'timezone': 'Asia/Kolkata'}, 'write_fmt_time': '2019-06-14T11:52:11.145000+05:30'}, 'data': {'start_ts': 1560492540.159, 'end_ts': 1560493136.119, 'label': 'Walk', 'start_local_dt': {'year': 2019, 'month': 6, 'day': 14, 'hour': 11, 'minute': 39, 'second': 0, 'weekday': 4, 'timezone': 'Asia/Kolkata'}, 'start_fmt_time': '2019-06-14T11:39:00.159000+05:30', 'end_local_dt': {'year': 2019, 'month': 6, 'day': 14, 'hour': 11, 'minute': 48, 'second': 56, 'weekday': 4, 'timezone': 'Asia/Kolkata'}, 'end_fmt_time': '2019-06-14T11:48:56.119000+05:30'}})
=== Trip: {"coordinates": [77.5633922, 13.030077], "type": "Point"} -> {"coordinates": [77.5641214, 13.0163405], "type": "Point"}
None
Likewise I see only five.
@deepalics0044 hm, I wonder if there is an underlying issue with the matching algorithm that is causing both this issue and the earlier one that you reported where you couldn't see the trip-end prompt results in the diary. The matching algorithm is pretty simple - can you look at the trip details and the confirmation object details and figure out why it doesn't work.
Alternatively, you can send me the dump for the day with the mismatch and I can take a look.
I am not able to reproduce the problem. Here's the list of trips.
In [12]: for ct in entry_it:
...: cte = ecwe.Entry(ct)
...: print("=== Trip:", cte.data.start_fmt_time, "->", cte.data.end_fmt_time
...: )
...:
=== Trip: 2019-06-18T15:54:46.179493+05:30 -> 2019-06-18T16:23:14.659000+05:30
=== Trip: 2019-06-18T16:41:27.850233+05:30 -> 2019-06-18T16:50:00.584000+05:30
=== Trip: 2019-06-18T18:32:16.203440+05:30 -> 2019-06-18T18:42:36.349000+05:30
=== Trip: 2019-06-18T19:19:03.393391+05:30 -> 2019-06-18T19:22:58.869000+05:30
=== Trip: 2019-06-19T11:08:24.697000+05:30 -> 2019-06-19T11:10:59.636000+05:30
=== Trip: 2019-06-19T11:21:39.129006+05:30 -> 2019-06-19T11:33:33.966000+05:30
=== Trip: 2019-06-19T11:37:20.094655+05:30 -> 2019-06-19T11:44:37.301000+05:30
=== Trip: 2019-06-19T12:32:53.399599+05:30 -> 2019-06-19T12:45:46.527000+05:30
and here's the list of confirm objects
=== Confirm: 2019-06-19T11:08:34.738000+05:30 -> 2019-06-19T11:14:47.839000+05:30
=== Confirm: 2019-06-19T11:08:24.697000+05:30 -> 2019-06-19T11:10:59.636000+05:30
=== Confirm: 2019-06-19T11:21:39.129006+05:30 -> 2019-06-19T11:33:33.966000+05:30
=== Confirm: 2019-06-19T11:37:20.094655+05:30 -> 2019-06-19T11:44:37.301000+05:30
=== Confirm: 2019-06-19T12:32:53.399599+05:30 -> 2019-06-19T12:45:46.527000+05:30
Note that the first two entries are essentially for the same trip.
And when I match them up, I get
=== Trip: 2019-06-18T16:41:27.850233+05:30 -> 2019-06-18T16:50:00.584000+05:30
=== Trip: 2019-06-18T18:32:16.203440+05:30 -> 2019-06-18T18:42:36.349000+05:30
=== Trip: 2019-06-18T19:19:03.393391+05:30 -> 2019-06-18T19:22:58.869000+05:30
=== Trip: 2019-06-19T11:08:24.697000+05:30 -> 2019-06-19T11:10:59.636000+05:30
~~~ Confirm: 2019-06-19T11:08:34.738000+05:30 -> 2019-06-19T11:14:47.839000+05:30
=== Trip: 2019-06-19T11:21:39.129006+05:30 -> 2019-06-19T11:33:33.966000+05:30
~~~ Confirm: 2019-06-19T11:21:39.129006+05:30 -> 2019-06-19T11:33:33.966000+05:30
=== Trip: 2019-06-19T11:37:20.094655+05:30 -> 2019-06-19T11:44:37.301000+05:30
~~~ Confirm: 2019-06-19T11:37:20.094655+05:30 -> 2019-06-19T11:44:37.301000+05:30
=== Trip: 2019-06-19T12:32:53.399599+05:30 -> 2019-06-19T12:45:46.527000+05:30
~~~ Confirm: 2019-06-19T12:32:53.399599+05:30 -> 2019-06-19T12:45:46.527000+05:30
which seems to be fine. I suspect that the reason you have additional confirm objects that are not matched is because they are actually duplicates of the ones that do match and when we find duplicates, we pick the last one.
Please reopen the issue and send me the logs of the days with the mismatch if this is not true.
Well after testing , I agree duplicates of the last pick is the reason behind I had additional confirm objects . The function works totally perfect.
The following code I used :-
countLabel = []
countLabel1 = []
for ct in entry_it:
cte = ecwe.Entry(ct)
print("=== Trip:", cte.data.start_loc,"->",cte.data.start_fmt_time, "->", cte.data.end_loc,"->", cte.data.end_fmt_time)
user_label = esdt.get_user_input_for_trip("analysis/cleaned_trip", test_user_id, cte.get_id(), "manual/mode_confirm")
user_label1 = esdt.get_user_input_for_trip("analysis/cleaned_trip", test_user_id, cte.get_id(), "manual/purpose_confirm")
print("=== Mode:", user_label.data.label if user_label != None else None)
print("=== Purpose:", user_label1.data.label if user_label1 != None else None)
countLabel.append(user_label.data.label if user_label != None else None)
countLabel1.append(user_label1.data.label if user_label1 != None else None)
section_it = esdt.get_sections_for_trip("analysis/cleaned_section", test_user_id, cte.get_id())
countLabel = [i for i in countLabel if i]
print('Number of modes entered:',len(countLabel))
countLabel1 = [j for j in countLabel1 if j]
print('Number of purpose entered:',len(countLabel1))
@deepalics0044 thanks for contributing! If you have time, you could submit this (either as a notebook or as a standalone script) with a pull request...
The below code helped me in extracting the data in tabular form : -
countLabel = []
countLabel1 = []
result = []
for index,user in all_users.iterrows():
#if(index not in [4,5]):
# continue
print('USER ID: ',user.uuid)
print('INDEX: ',index)
print('-------------------------------------')
ts = esta.TimeSeries.get_time_series(user.uuid)
entry_it = ts.find_entries(["analysis/cleaned_trip"], time_query=None)
userTrips = []
for ct in entry_it:
cte = ecwe.Entry(ct)
#print("=== Trip:", cte.data.start_loc,"->",cte.data.start_fmt_time, "->", cte.data.end_loc,"->", cte.data.end_fmt_time)
user_label = esdt.get_user_input_for_trip("analysis/cleaned_trip", user.uuid, cte.get_id(), "manual/mode_confirm")
user_label1 = esdt.get_user_input_for_trip("analysis/cleaned_trip", user.uuid, cte.get_id(), "manual/purpose_confirm")
#print("=== Mode:", user_label.data.label if user_label != None else None)
#print("=== Purpose:", user_label1.data.label if user_label1 != None else None)
#countLabel.append(user_label.data.label if user_label != None else None)
#countLabel1.append(user_label1.data.label if user_label1 != None else None)
section_it = esdt.get_sections_for_trip("analysis/cleaned_section", user.uuid, cte.get_id())
testFrame = pd.DataFrame.from_dict({user.uuid:cte.data},orient='index')
testFrame['mode'] = user_label.data.label if user_label != None else None
testFrame['purpose'] = user_label1.data.label if user_label1 != None else None
userTrips.append(testFrame)
if(userTrips!=[]):
userTrips = pd.concat(userTrips,ignore_index=True)
userTrips['uuid'] = user.uuid
result.append(userTrips)
#countLabel = [i for i in countLabel if i]
#print('Number of modes entered:',len(countLabel))
#countLabel1 = [j for j in countLabel1 if j]
#print('Number of purpose entered:',len(countLabel1))
print('-------------------------------------')
result = pd.concat(result,ignore_index=True)
#print(result)
result.to_csv('./trips.csv',index=False)
I ran the pipeline and got the recent mode and purposes made by Ipsita ma'am. But I see the aggregate mode and purpose the code I run for -
What code needs to apply to see the mode and purpose respective of uuid?