BIDMCDigitalPsychiatry / LAMP-platform

The LAMP Platform (issues and documentation).
https://docs.lamp.digital/
Other
12 stars 10 forks source link

SensorKit visits events, and report duration across many sensors. #703

Closed carlan1 closed 1 year ago

carlan1 commented 1 year ago

Here I am reporting two observations noticed in the new SensorKit sensors that may or may not need to be changed.

  1. Visits feature redundancy

Some of the visits features seem to repeat events or visits. Some are simply duplicate events, some are not quite duplicates as they differ in timestamp, but are identical in values. Example of a duplicate event:

{'distanceFromHome': 0,
   'locationCategory': 1,
   'departureDateInterval': {'start': 686874600, 'duration': 900},
   'arrivalDateInterval': {'start': 686863800, 'duration': 900}},
  'timestamp': 1665319585056,
  'sensor': 'com.apple.sensorkit.visits'},
 {'timestamp': 1665319585056,
  'data': {'locationCategory': 1,
   'distanceFromHome': 0,
   'departureDateInterval': {'start': 686874600, 'duration': 900},
   'arrivalDateInterval': {'start': 686863800, 'duration': 900}},
  'sensor': 'com.apple.sensorkit.visits'}

Another example of data points with different timestamps that describe the same event:

{'sensor': 'com.apple.sensorkit.visits',
   'timestamp': 1665455352316,
   'data': {'distanceFromHome': 589.678911984311,
    'arrivalDateInterval': {'start': 687116700, 'duration': 900},
    'departureDateInterval': {'start': 687117600, 'duration': 900},
    'locationCategory': 0},
{'sensor': 'come.apple.sensorkit.visits',
   'timestamp': 1665455352314,
   'sensor': 'com.apple.sensorkit.visits',
   'data': {'departureDateInterval': {'start': 687117600, 'duration': 900},
    'distanceFromHome': 589.678911984311,
    'arrivalDateInterval': {'start': 687116700, 'duration': 900},
    'locationCategory': 0}

Unless I am misunderstanding the schema of this data, we would prefer that there is no overlap in time between visits ActivityEvents. As in, for an event that starts at t1 and ends at t2, and for a second event that starts at t3 and ends at t4, t3 should not be between t1 and t2.

  1. A second issue where we are interested in whether it is possible to modify the report duration indicated by the "duration" key, specifically for phone usage and messages usage sensors. We are interested in learning the number of unique contacts who are called or texted in a time frame. However, if report duration is too short, we run the risk of the same contact being counted numerous times, as the totalUniqueContacts property reports unique contacts per report. If a person is texted or called more than once over a span greater than the report duration, the same person would be captured across numerous reports. Would it be possible to modify the report duration preferably using an argument, or at minimum, could it be extended?
jijopulikkottil commented 1 year ago

1.1 yes, we are not expecting duplicate events. It is investigating here #699 1.2 We can expect same data with different timestamp. But the example above should not happen, because both timestamps represent same time in seconds. We will check this.. [the delay in investigating this issue is lack of sufficient data collecting].

  1. We can't modify the duration. To fetch the data, we can pass start and end date only. As far as we know, we are fetching the data which are already generated by the system. We just querying it using start and end dates. [We will verify it by changing the start and end dates and crosscheck the collected data in different time spans. Can update here once done]
carlan1 commented 1 year ago

1.1 yes, we are not expecting duplicate events. It is investigating here #699 1.2 We can expect same data with different timestamp. But the example above should not happen, because both timestamps represent same time in seconds. We will check this.. [the delay in investigating this issue is lack of sufficient data collecting].

  1. We can't modify the duration. To fetch the data, we can pass start and end date only. As far as we know, we are fetching the data which are already generated by the system. We just querying it using start and end dates. [We will verify it by changing the start and end dates and crosscheck the collected data in different time spans. Can update here once done]

Today in call with @michaelmenon we discussed the possibility for rejecting samples if they are very similar in their timestamps. For now, we do not need this change implemented. For now, let's stick with fixing duplicate events (issue #699) and changing the units from seconds to milliseconds (#701). Because both of these requests are described in the other issues, I am now closing this issue as it provides nothing new.