Discrepancy in number of pairs in training set

Paper states that, There are approximately 5.7k and 23.4k pairs in the validation and training sets respectively. But when training is executed we get as below, Loading BABEL train: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6601/6601 [00:53<00:00, 124.08it/s] [teach.data.babel][INFO] - Processed 6601 sequences and found 3091 invalid cases based on the datatype. [teach.data.babel][INFO] - 15863 sequences -- datatype:separate_pairs. [teach.data.babel][INFO] - 14.13% of the sequences which are rejected by the sampler in total. [teach.data.babel][INFO] - 0.0% of the sequence which are rejected by the sampler, because of the excluded actions. [teach.data.babel][INFO] - 14.13% of the sequence which are rejected by the sampler, because they are too short(<0.5 secs) or too long(>25.0 secs). [teach.data.babel][INFO] - Discard from BML: 0 [teach.data.babel][INFO] - Discard not KIT: 0 Loading BABEL val: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2189/2189 [00:17<00:00, 124.54it/s] [teach.data.babel][INFO] - Processed 2189 sequences and found 983 invalid cases based on the datatype. [teach.data.babel][INFO] - 5672 sequences -- datatype:separate_pairs. [teach.data.babel][INFO] - 16.27% of the sequences which are rejected by the sampler in total. [teach.data.babel][INFO] - 0.0% of the sequence which are rejected by the sampler, because of the excluded actions. [teach.data.babel][INFO] - 16.27% of the sequence which are rejected by the sampler, because they are too short(<0.5 secs) or too long(>25.0 secs). [teach.data.babel][INFO] - Discard from BML: 0 [teach.data.babel][INFO] - Discard not KIT: 0

which results that the number of training pairs are 15.8k which doesn't match with that of in paper (i.e 23.4k)

athn-nik / teach

Discrepancy in number of pairs in training set #10