While working on this dataset we attempted to reproduce the results but created our own dataloader. We observed that the results were not fair as the loader provided mixed data from different users. Therefore, the validation provided in the paper is not fair as there is leaked data from other users.
In order to show that, we exported the timestamps for each item and then looked for the corresponding subject IDs. We observed that 47% of the validation data has a duplicate in the training set. Here is an example:
Hi! Thanks for the release of this dataset.
While working on this dataset we attempted to reproduce the results but created our own dataloader. We observed that the results were not fair as the loader provided mixed data from different users. Therefore, the validation provided in the paper is not fair as there is leaked data from other users.
In order to show that, we exported the timestamps for each item and then looked for the corresponding subject IDs. We observed that 47% of the validation data has a duplicate in the training set. Here is an example:
idx | Subject ID | Subject ID () | (processed file) | (from timestamp) 400 | S02 | S00 MISMATCH! 401 | S02 | S00 MISMATCH!