OxWearables / ssl-wearables

Self-supervised learning for wearables using the UK-Biobank (>700,000 person-days)
https://oxwearables.github.io/ssl-wearables/
Other
104 stars 24 forks source link

Mismatch between provided labels in Capture-24 and published figures? #9

Closed findalexli closed 5 months ago

findalexli commented 1 year ago

Hello authors,

I saw that using your processed capture 24_30hz_full Y.npy, the unique labels are {'moderate-vigorous', 'sleep', 'light', 'sedentary'} However, I was confused when I see the corresponding UMAP feature map in the paper.

image

Can you please explain the difference?

As a side question, with all due respect why was the subset of labels chosen out of the annotation directory, dubbed Walmsley2020? They seem to be easy to be distinguished by the the magnitude of accelerator along and does not carry any periodic information or fine upper body activity.

angerhang commented 1 year ago

I saw that using your processed capture 24_30hz_full Y.npy, the unique labels are {'moderate-vigorous', 'sleep', 'light', 'sedentary'} However, I was confused when I see the corresponding UMAP feature map in the paper. image Can you please explain the difference?

@findalexli thanks a lot for your interest in our work and for raising this issue, we probably uploaded the wrong data to 24_30hz_full Y.npy file which we will fix. You are right that the label is taken from Walmsley2020 paper.

As a side question, with all due respect why was the subset of labels chosen out of the annotation directory, dubbed Walmsley2020? They seem to be easy to be distinguished by the the magnitude of accelerator along and does not carry any periodic information or fine upper body activity.

That's an excellent question, the rationale behind Walmsley2020 for the classification into {'moderate-vigorous', 'sleep', 'light', 'sedentary'} is that capture-24 was originally annotated using over 200+ activity classes explained in Willetts2018. We wanted to achieve the following:

The the UMAP example is just to do a sanity check for face validity in a zero-shot learning setting for different kinds of downstream datasets. There were other examples that we showed in the paper that demonstrated periodic information better than the capture24 dataset. The value that capture24 provides is that it is a much larger free-living datasets which we believe is a good test for the generalisability of our pre-trained model.

Let us know if you have any other questions :D

angerhang commented 5 months ago

The dataset description paper has been published here: https://arxiv.org/abs/2402.19229

Hopefully that should clarify this issue, hence closing.