Mismatch between provided labels in Capture-24 and published figures?

findalexli commented 1 year ago

Hello authors,

I saw that using your processed capture 24_30hz_full Y.npy, the unique labels are {'moderate-vigorous', 'sleep', 'light', 'sedentary'} However, I was confused when I see the corresponding UMAP feature map in the paper.

Can you please explain the difference?

As a side question, with all due respect why was the subset of labels chosen out of the annotation directory, dubbed Walmsley2020? They seem to be easy to be distinguished by the the magnitude of accelerator along and does not carry any periodic information or fine upper body activity.

angerhang commented 1 year ago

I saw that using your processed capture 24_30hz_full Y.npy, the unique labels are {'moderate-vigorous', 'sleep', 'light', 'sedentary'} However, I was confused when I see the corresponding UMAP feature map in the paper. Can you please explain the difference?

@findalexli thanks a lot for your interest in our work and for raising this issue, we probably uploaded the wrong data to 24_30hz_full Y.npy file which we will fix. You are right that the label is taken from Walmsley2020 paper.

As a side question, with all due respect why was the subset of labels chosen out of the annotation directory, dubbed Walmsley2020? They seem to be easy to be distinguished by the the magnitude of accelerator along and does not carry any periodic information or fine upper body activity.

That's an excellent question, the rationale behind Walmsley2020 for the classification into {'moderate-vigorous', 'sleep', 'light', 'sedentary'} is that capture-24 was originally annotated using over 200+ activity classes explained in Willetts2018. We wanted to achieve the following:

Classifying 200+ activity classes is not feasible due to the limited labeled data. Some classes only had a few instances.
We would want to have some sort of label mapping from fine-grained classes to functional classes like sleep and moderate-vigorous` activities that help us to understand how physical activities and sleep contribute to our health in epidemiological research.

The the UMAP example is just to do a sanity check for face validity in a zero-shot learning setting for different kinds of downstream datasets. There were other examples that we showed in the paper that demonstrated periodic information better than the capture24 dataset. The value that capture24 provides is that it is a much larger free-living datasets which we believe is a good test for the generalisability of our pre-trained model.

Let us know if you have any other questions :D

angerhang commented 5 months ago

The dataset description paper has been published here: https://arxiv.org/abs/2402.19229

Hopefully that should clarify this issue, hence closing.

OxWearables / ssl-wearables

Mismatch between provided labels in Capture-24 and published figures? #9