EricGuo5513 / HumanML3D

HumanML3D: A large and diverse 3d human motion-language dataset.
MIT License
767 stars 76 forks source link

Critical Issue: Overlap of 3766 IDs between train_val.txt and test.txt #128

Closed Morris-Lucifer closed 6 months ago

Morris-Lucifer commented 6 months ago

Description

While conducting a routine check on our dataset, I discovered that there are 3766 overlapping IDs between train_val.txt and test.txt. This overlap might affect the integrity of our training and testing processes, as the same IDs are not supposed to appear in both sets.

Steps to Reproduce

  1. Extract IDs from train_val.txt and test.txt.
  2. Find the intersection of these two sets of IDs.
  3. Count the number of overlapping IDs.

Expected Behavior

train_val.txt and test.txt should have distinct sets of IDs with no overlap to ensure the separation of training/validation and testing data.

Actual Behavior

There are 3766 IDs that appear in both train_val.txt and test.txt, indicating a significant overlap.

Possible Solution

We need to investigate how these overlapping IDs were included in both files and ensure that the data splitting process segregates the IDs correctly without any overlaps.

EricGuo5513 commented 6 months ago

Hi, thanks for pointint out this VERY CRITICAL issue. I have a quick check. You are absolutely correct, that there are huge amount of overlaping ids bettween train_val.txt and test.txt. I also checked the file train.txt, test.txt and val.txt, and find no overlaping. There must be errors while mergeing train.txt and val.txt files.

EricGuo5513 commented 6 months ago

I think it should still works right for people who are using train.txt, val.txt and test.txt files separately. However, we apologize for this mistake, and now are updating train_val.txt file to the correct version.

Morris-Lucifer commented 6 months ago

Thank you very much for the swift response and resolution of this issue!