BIDMCDigitalPsychiatry / LAMP-platform

The LAMP Platform (issues and documentation).
https://docs.lamp.digital/
Other
12 stars 10 forks source link

Step Count Contains Duplicates and Inaccuracies #730

Closed lugray1 closed 1 year ago

lugray1 commented 1 year ago

Recently we have noticed that for multiple participants, some of their daily step count values calculated using cortex.secondary.step_count.step_count have been incredibly high (>200,000 on some days). We investigated this issue by looking at data from cortex.raw.steps.steps.

iPhone example: Output from this cortex function shows that there are some duplicate rows. In addition, there are two categories of value in the 'source' column: com.apple.health, and 'null'. The values in rows with com.apple.health as the source seem reasonable, although there are some duplicates. However, values in rows with 'null' as the source appear to be an accumulation of all the steps taken so far that day.

For example, the screenshot below shows cortex.raw.steps.steps for one participant in pd.DataFrame form, and filtered by'type' = step_count. Rows 0 and 1, and 22 and 23 are exact duplicates of each other. Rows 6-21 are all from the same day, but the values appear to be building on each other. The same is happening with rows 28-63.

We would like for duplicates to not appear in the raw data, and to not have accumulating values.

Screen Shot 2023-02-06 at 10 16 26 AM

Android example: Android devices do not seem to have the same issues with duplicates and 'null' accumulating source values that iPhones do. However, some Android step counts still seem unbelievably high. Below is output from cortex.raw.steps.steps for a participant with an Android phone (output is in DataFrame form and filtered by day and'type' = step_count). Their step count for the day that is shown in the image was over 55,000. As you can see from the image, the step counts seem inaccurate - one example is in rows 3250 and 3251, in which 86 steps were recorded in .005 seconds.

We would like to know if there is anything that can be done to make the Android steps more accurate.

Screen Shot 2023-02-06 at 10 50 22 AM
ZCOEngineer commented 1 year ago

@lugray1 We shall analyze this and get back

jijopulikkottil commented 1 year ago

iPhone:

The values in rows with com.apple.health as the source seem reasonable, although there are some duplicates

Yes, could you list more data with source com.apple.health to check whether all data having duplicates or not. We can investigate this issue.

However, values in rows with 'null' as the source appear to be an accumulation of all the steps taken so far that day.

Yes, It is the data from Pedometer, we are taking data from start time(12:00 am) of the day. Because from pedometer, we are getting Step-Count along with other data like Cadence, Distance, Pace, AvgActivePace, Floors. So if we query with small intervals, we won't get accurate data and will lead to duplicates.

sijitg commented 1 year ago

Android:

We will check and update on this issue.

jijopulikkottil commented 1 year ago

Hi @lugray1 , Could you check the mindLAMP iOS app version. So that we can confirm the step-count data you provided is from latest app version or not! [The app version is recorded with lamp.analytics event]

lugray1 commented 1 year ago

Hi @lugray1 , Could you check the mindLAMP iOS app version. So that we can confirm the step-count data you provided is from latest app version or not! [The app version is recorded with lamp.analytics event]

For the original screenshot I posted, their values for 'user_agent' are 'NativeCore 2022.7.13; iOS 16.1.2;'.

Yes, could you list more data with source com.apple.health to check whether all data having duplicates or not. We can investigate this issue.

For additional examples, I looked at 5 participants. 3 of them had the duplicate issue, one did not have duplicates but did appear to have values that were very similar, very close together, and one appeared to have no duplicates or very similar values from source com.apple.health. I listed their mindLAMP iOS app versions and some screenshots below.

Below is a screenshot of data from another participant who had the same duplicate issue with source com.apple.health. Their value for 'user_agent' is 'NativeCore 2022.7.13; iOS 16.1.2;'.

Screen Shot 2023-02-14 at 3 33 59 PM

I found two other instances of participants with duplicates. One had a 'user_agent' of 'NativeCore 2022.7.13; iOS 16.1.1;', and another had 'user_agent' of 'NativeCore 2022.7.13; iOS 9.2;'.

Below is an example of a participant who does not appear to have the duplicates, but does have some questionable values. As you can see in rows 7 and 8, the values are similar (65 and 69) but the timestamps are 2 seconds apart. A similar issue is shown in rows 9 and 10 (values are 197 and 106 recorded at 1 second apart). Their 'user_agent' value is 'NativeCore 2022.7.13; iOS 16.1.2'.

Screen Shot 2023-02-14 at 3 21 50 PM

Here is an example of a participant without duplicates. Their 'user_agent' value is ''NativeCore 2022.7.13; iOS 16.2;'.

Screen Shot 2023-02-14 at 3 32 36 PM
lugray1 commented 1 year ago

Yes, It is the data from Pedometer, we are taking data from start time(12:00 am) of the day. Because from pedometer, we are getting Step-Count along with other data like Cadence, Distance, Pace, AvgActivePace, Floors. So if we query with small intervals, we won't get accurate data and will lead to duplicates.

Okay, what you're saying about the intervals with the pedometer makes sense! Is this something you would recommend we filter for in the cortex step_count function, then, so we are not including the pedometer values in the step_count? Thanks!

jijopulikkottil commented 1 year ago

@lugray1 Thanks to the detailed response. As you said, there are duplicates in 2 screenshots.

  1. https://user-images.githubusercontent.com/89207083/217010091-10304fdc-21a9-484a-a2b4-bbe2497fda57.png
  2. https://user-images.githubusercontent.com/89207083/218855973-79193c74-3d3a-41b3-ab26-28f66ddce4a8.png We will fix it ASAP.

see in rows 7 and 8, the values are similar (65 and 69) but the timestamps are 2 seconds apart. A similar issue is shown in rows 9 and 10 (values are 197 and 106 recorded at 1 second apart).

This is expected behaviour with multiple sources. One from iPhone and other might be from Watch or other sources. We can see the source identifier is different. I think we only need to rely on one source at a time to get the correct data.

Okay, what you're saying about the intervals with the pedometer makes sense! Is this something you would recommend we filter for in the cortex step_count function, then, so we are not including the pedometer values in the step_count? Thanks!

Yes, can ignore the pedometer value (source = null) if daily accumulated value is not needed.

jijopulikkottil commented 1 year ago

@carlan1 Currently to detect the source, there is source identifier like com.apple.health.ACFGDJ. As per today discussion, If possible, we can add an additional field to detect the source type phone or watch or other.

carlan1 commented 1 year ago

Please remove the duplicates if they are truly duplicates.

If possible to add the name of the source type, as discussed in call, please do so.

Regarding the accumulation of daily steps, it would be helpful to change the name of the source from "null" to "daily pedometer" or something similar.

jijopulikkottil commented 1 year ago

@lugray1 to debug the issue, could you provide few more rows (with source = com.apple.health.05CD2DE7-3FBB-4E04-80BF.... " ) after 50 and 51 rows in screenshot https://user-images.githubusercontent.com/89207083/218855973-79193c74-3d3a-41b3-ab26-28f66ddce4a8.png

jijopulikkottil commented 1 year ago

@lugray1 no need to provide more details. Identified the issue. We are working on it.

jijopulikkottil commented 1 year ago

iOS coding is completed. QA testing is almost completed. We can push the code tomorrow when QA testing is over.

  1. Added new key device_model to identify the device type.
  2. Added "daily pedometer" as source if data from Pedometer.
  3. Duplicate issue fixed.

Following are the sample data.

{
            "sensor": "lamp.steps",
            "data": {
                "device_model": "Watch",
                "value": 30,
                "source": "com.apple.health.3FC36D7A-7519-42EF-98F0-08A41C83A33B",
                "type": "step_count",
                "unit": "count"
            },
            "timestamp": 1676974618563
        },{
            "sensor": "lamp.steps",
            "data": {
                "device_model": "iPhone",
                "value": 248,
                "source": "com.apple.health.47587FF6-BAB0-4E12-A0BD-614952C35C82",
                "type": "step_count",
                "unit": "count"
            },
            "timestamp": 1676963839214
        },{
            "sensor": "lamp.steps",
            "data": {
                "value": 1272,
                "source": "daily pedometer",
                "type": "step_count",
                "unit": "count"
            },
            "timestamp": 1676963827636
        }
}

Note. Hope our aim is to Not calculate a user's daily step count. Suppose if a user is using both iPhone and Watch, sometimes he may walk with Watch and sometimes he walk with both phone and watch. So to get the user's daily step count, we must use some statistical fetch query to get the aggregate result.

ZCOEngineer commented 1 year ago

On Android we got this issue reproduced yesterday, we are working on a fix, once developer testing is complete we shall give for QA testing.

jijopulikkottil commented 1 year ago

iOS code has been pushed. Version 2023.2.23 (Staging pointed) is available for appstore users.

sijitg commented 1 year ago

In android we have identified a bug in sending timestamp and fixed it, QA is in progress

sijitg commented 1 year ago

We pushed android code. Version 2023.2.28 is in playstore review for Open testing.

sijitg commented 1 year ago

Android step count issue is ready for testing in playstore beta.

lugray1 commented 1 year ago

This appears to be fixed in the iOS version. For Androids, all step values are returning as 'None', even with high data quality (see output of cortex.secondary.step_count below):

Screen Shot 2023-03-13 at 12 05 27 PM
ZCOEngineer commented 1 year ago

@lugray1 We shall review this on Android and get back

sijitg commented 1 year ago

@lugray1 Is it possible to share the log file ? LampLog.txt can be found in data folder.

lugray1 commented 1 year ago

@lugray1 Is it possible to share the log file ? LampLog.txt can be found in data folder.

I am not sure how to access LampLog.txt or the data folder. Can you provide additional details about how I would get this file?

sijitg commented 1 year ago

@lugray1 Due to file access restrictions in new Android OS version, File Manager applications in the phone may not be able to access this file. But we can take it by connecting the phone to computer and navigate to the path shown in the screenshot attached image [](url)

sijitg commented 1 year ago

Also it is helpful for us to see API response of sensor_event

Here we verify sensor data being uploaded to server by calling below API using the website https://reqbin.com/

https://api-staging.lamp.digital/participant/U4530111336/sensor_event?origin=lamp.steps

(Replace Participant Id in above url)

Is it possible to share the API response or the user credential you are using?

image

@lugray1

lugray1 commented 1 year ago

Thank you for providing those details! Unfortunately, I have tried to collect steps on several of our lab's test Android phones, and am having trouble with passive data collection (I am assuming this has more to do with the age and quality of the phones than anything else).

We currently have some participants using mindLAMP on Androids. My understanding is that if participants download mindLAMP on an Android, they are automatically using the Beta version of the app, and so I have been trying to access step data from their accounts to see if the issue is fixed. These accounts show that passive data is being collected, but all step values return as 'None'. However, I am unable to provide any of these user credentials over GitHub.

Would it be possible for you to make a test account on an Android phone to investigate this further? Apologies for this inconvenience!

sijitg commented 1 year ago

Thanks for the details. We are testing it one more round using older android versions and low end phone. Do you have any one reported this issue in production?

ZCOEngineer commented 1 year ago

@lugray1 in your last comment you were trying to check with participants using android beta to see if the issue is there. did you find this issue in their data samples collected?

carlan1 commented 1 year ago
  1. Check API response for SensorEvent for lamp.steps for faulty participants / None values
  2. Check if data coming in from beta android
carlan1 commented 1 year ago

Our Android devices running the beta app are unable to collect any steps data at all. Please check on your devices if you are able to collect any steps data on Android devices while using the beta app @ZCOEngineer @sijitg.

carlan1 commented 1 year ago

This is the API response for one of the participants receiving no data for steps (using Android app, production):

{
    "data": []
}
ZCOEngineer commented 1 year ago

@carlan1 We shall check this today

sijitg commented 1 year ago

We checked using Beta app (2023.4.5 version ) and there is step data. Please see the attached

user details:

U6076062230@lamp.com U6076062230

api_response
sijitg commented 1 year ago

Please make sure that the device has Google fit signed in and when prompted allow its permissions in app.

carlan1 commented 1 year ago

Yes we were able to get data after making sure the participant had google fit set up.

lugray1 commented 1 year ago

We are now able to get step values on an Android using the Beta app (after setting up Google Fit). The values look accurate; however there do appear to be duplicates for some values. I am attaching a screenshot below of output from cortex.raw.steps. As you can see, values in rows 0-4 look correct, but from row 5 onward there are duplicate events.

Screen Shot 2023-04-11 at 9 55 13 AM
ZCOEngineer commented 1 year ago

@lugray1 we shall check this

sijitg commented 1 year ago

Every time after login , android app is sending all the fitness data received from Google fit so after app uninstall/ re login previously uploaded data may be sent again. We can avoid it by sending the data after login. We implemented this change. After QA we can update build.

sijitg commented 1 year ago

We have submitted this change to play store. Staging build is in review.

sijitg commented 1 year ago

This is live for Open testing.

lugray1 commented 1 year ago

Looks good, thank you! I will close this issue as resolved.