We have identified several issues with the original implementation in examples/ogb/train_gap.py that require modifications:
Incompatibility with the Current OGB PCQM4Mv2 Dataset: The current version of the OGB PCQM4Mv2 dataset includes atoms not listed in ogb_node_types and contains entries with empty labels. We skipped these incompatible entries in the preprocessing code.
Failed Instantiation of AdiosDataset: The code currently instantiates AdiosDataset with incompatible parameters. The opt dictionary should be unpacked before being passed as arguments.
Broadcasting Over 2GB Data with MPI: The Adios_writer class occasionally attempts to broadcast over 2GB of data, exceeding the MPI message count limit. We have implemented a chunk-based broadcasting function to address this issue.
These bug fixes are essential for later integrating our DeepSpeed and pipeline-parallelism implementations, which use the OGB PCQM4Mv2 dataset as an example.
We have identified several issues with the original implementation in examples/ogb/train_gap.py that require modifications:
AdiosDataset
: The code currently instantiatesAdiosDataset
with incompatible parameters. Theopt
dictionary should be unpacked before being passed as arguments.Adios_writer
class occasionally attempts to broadcast over 2GB of data, exceeding the MPI message count limit. We have implemented a chunk-based broadcasting function to address this issue.These bug fixes are essential for later integrating our DeepSpeed and pipeline-parallelism implementations, which use the OGB PCQM4Mv2 dataset as an example.