Closed wdm2 closed 1 year ago
The completeset types files contain ALL of the data, as such both the completeset_train and completeset_test files are identical (the reason that two exist has to do with compatibility of some lab scripts which assume the existence of a train & test file).
If you want to split it up, you can use our 3fold clustered-cross-validation splits (instructions here ). These are the it2_tt_v1.3_train[0-2].types and corresponding test.types. The train each contain 2/3 of the data, and the test the remaining 1/3.
Or you could generate your own.
Thank you so much for your prompt and detailed reply! Understanding the dataset's handling is now clear, allowing me to proceed with my work confidently.
Thank you for sharing your excellent work. I have downloaded the crossdocked2020 v1.3 data. I would like to know how all data is divided into train and test. It seems that "it2_tt_v1.3_completeset_test0.types" and "it2_tt_v1.3_completeset_train0.types" are the same file. I thought it2_tt_v1.3_train[0-2].types was concatenated with it2_tt_v1.3_completeset_train0.types, is that correct?