Open llan-ml opened 2 years ago
Thanks for flagging. This this is indeed a bug in the open source implementation that was accidentally introduced when refactoring for open sourcing the code.
The correct code should be:
elif split == "train":
indices = datasets.load_all_except_kth_fold_indices(
data_root, k_fold_split_id, num_k_fold_splits)
indices += datasets.load_splits()["train"]
I can confirm the version of the code used for the actual competition was not affected by this. Sorry for the inconvenience this may have caused, and thanks so much for flagging! We will push a fix soon.
In the code for the dataset PCQM4M, the function
_load_smiles
is used to load data when building data iterator.However, when the
split
is set totrain
, the functiondatasets.load_all_except_kth_fold_indices
only loads the validation indices except the k-th validation fold, and the train split is not used.@alvarosg @saran-t