awslabs / dgl-lifesci

Python package for graph neural networks in chemistry and biology
Apache License 2.0
710 stars 144 forks source link

Problem loading train_set of rexgen_direct example for local training #171

Open marcossilva opened 2 years ago

marcossilva commented 2 years ago

Hi! I'm trying to train the rexgen model in https://github.com/awslabs/dgl-lifesci/tree/master/examples/reaction_prediction/rexgen_direct but while loading the USPTO data I'm getting a pickle problem as can be seen below:

from dgllife.data import USPTOCenter, WLNCenterDataset
train_set = USPTOCenter('train', num_processes=2)
Preparing train subset of USPTO for reaction center prediction.
Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results
    task = get()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
RuntimeError: invalid value in pickle

This error doesn't happen while loading val and test sets though. Below is my libs versions:

rdkit-pypi==2021.9.4
dgl-cu113==0.7.2
dgllife==0.2.9
mufeili commented 2 years ago

I've not encountered that before. Can you try using num_processes=1?

marcossilva commented 2 years ago

It leads to

>>> train_set = USPTOCenter('train', num_processes=1)
Preparing train subset of USPTO for reaction center prediction.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marcos/.local/lib/python3.8/site-packages/dgllife/data/uspto.py", line 661, in __init__
    super(USPTOCenter, self).__init__(
  File "/home/marcos/.local/lib/python3.8/site-packages/dgllife/data/uspto.py", line 461, in __init__
    self.load_reaction_data(path_to_reaction_file, num_processes)
  File "/home/marcos/.local/lib/python3.8/site-packages/dgllife/data/uspto.py", line 523, in load_reaction_data
    mol, reaction, graph_edits = load_one_reaction(li)
  File "/home/marcos/.local/lib/python3.8/site-packages/dgllife/data/uspto.py", line 319, in load_one_reaction
    reaction, graph_edits = line.strip("\r\n ").split()

but after cleaning the downloading file from ~/.dgl/ and setting num_processes=1 it worked.

I realized that on the find_reaction_center_train.py file the default argument for the number of processes if 4

parser.add_argument('-np', '--num-processes', type=int, default=4,
                        help='Number of processes to use for data pre-processing')

so running the script with the default arguments lead to this error

mufeili commented 2 years ago

Thanks. This might be hardware-specific. Perhaps we should change the default value to 1 instead. Could you open a PR to change the default value?