google-deepmind / deepmind-research

This repository contains implementations and illustrative code to accompany DeepMind publications
Apache License 2.0
13.3k stars 2.61k forks source link

[ogb_lsc] the code for pcq does not use the train split during training #385

Open llan-ml opened 2 years ago

llan-ml commented 2 years ago

In the code for the dataset PCQM4M, the function _load_smiles is used to load data when building data iterator.

However, when the split is set to train, the function datasets.load_all_except_kth_fold_indices only loads the validation indices except the k-th validation fold, and the train split is not used.

@alvarosg @saran-t

alvarosg commented 2 years ago

Thanks for flagging. This this is indeed a bug in the open source implementation that was accidentally introduced when refactoring for open sourcing the code.

The correct code should be:

  elif split == "train":
    indices = datasets.load_all_except_kth_fold_indices(
        data_root, k_fold_split_id, num_k_fold_splits)
    indices += datasets.load_splits()["train"]

I can confirm the version of the code used for the actual competition was not affected by this. Sorry for the inconvenience this may have caused, and thanks so much for flagging! We will push a fix soon.