facebookresearch / clutrr

Diagnostic benchmark suite to explicitly test logical relational reasoning on natural language
Other
90 stars 14 forks source link

--test_rows not always accurate #3

Closed NicolasAG closed 3 years ago

NicolasAG commented 4 years ago

Hello, I noticed something while generating clultrr data: This is the exact command I ran: python main.py --train_tasks 4.2,4.3,4.4 --test_tasks 4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9,4.10 --train_rows 100000 --test_rows 10000 --equal --data_name 'r3-disco_l234' and these are my number of lines in the csv test files:

$ wc -l data/data_r3-disco_l234_*/*_test.csv

10011 data/data_r3-disco_l234_1571563154.7491844/4.10_test.csv  ---> 10k : ok
 3100 data/data_r3-disco_l234_1571563154.7491844/4.2_test.csv   ---> much less than 10k... 
 2941 data/data_r3-disco_l234_1571563154.7491844/4.3_test.csv   ---> much less than 10k... 
 3007 data/data_r3-disco_l234_1571563154.7491844/4.4_test.csv   ---> much less than 10k... 
10025 data/data_r3-disco_l234_1571563154.7491844/4.5_test.csv  ---> 10k : ok
10014 data/data_r3-disco_l234_1571563154.7491844/4.6_test.csv  ---> 10k : ok
10038 data/data_r3-disco_l234_1571563154.7491844/4.7_test.csv  ---> 10k : ok
10044 data/data_r3-disco_l234_1571563154.7491844/4.8_test.csv  ---> 10k : ok
10023 data/data_r3-disco_l234_1571563154.7491844/4.9_test.csv  ---> 10k : ok

I trained on tasks 4.2,4.3,4.4 and it seems like when generating the test sets, it combined all these together as one task: note that 3100+2941+3007~10k

koustuvsinha commented 3 years ago

Should be fixed with GLC integration #9.