Closed Morris-Lucifer closed 6 months ago
Hi, thanks for pointint out this VERY CRITICAL issue. I have a quick check. You are absolutely correct, that there are huge amount of overlaping ids bettween train_val.txt
and test.txt
. I also checked the file train.txt
, test.txt
and val.txt
, and find no overlaping. There must be errors while mergeing train.txt
and val.txt
files.
I think it should still works right for people who are using train.txt
, val.txt
and test.txt
files separately. However, we apologize for this mistake, and now are updating train_val.txt
file to the correct version.
Thank you very much for the swift response and resolution of this issue!
Description
While conducting a routine check on our dataset, I discovered that there are 3766 overlapping IDs between
train_val.txt
andtest.txt
. This overlap might affect the integrity of our training and testing processes, as the same IDs are not supposed to appear in both sets.Steps to Reproduce
train_val.txt
andtest.txt
.Expected Behavior
train_val.txt
andtest.txt
should have distinct sets of IDs with no overlap to ensure the separation of training/validation and testing data.Actual Behavior
There are 3766 IDs that appear in both
train_val.txt
andtest.txt
, indicating a significant overlap.Possible Solution
We need to investigate how these overlapping IDs were included in both files and ensure that the data splitting process segregates the IDs correctly without any overlaps.