Closed soulios closed 1 week ago
Thanks for the feedback @soulios, I'll look into it :)
Is it only DMPNN that is significantly slower? What about e.g., GINConv, GATv2Conv, or perhaps MPNN?
And is it specifically training that is slow? Or is it the generation of the input?
Two quick things to try to speed up the training: comment out TensorBoard, replace SetGatherReadout with (Vanilla) Readout, and replace DMPNN with e.g. MPNN or GIN.
I compared it with chemprop which only has DMPNN, so I cannot tell for the other models. And I use DMPNN because it is more or less the SOTA on the tasks I am interested in(tox21 etc). I was referring only to the training(not the molecular encoding). Thanks I will try these.
Also a bit relevant to the speed, I tried saving and loading using tf_records and and when loading my gpu memory maxes out. Have you encountered such an issue? Several notebooks/examples would be helpful on this as well as pretraining.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Following the notebook file, I tried to construct a DMPNN model for tox21. Training for some reason takes way longer than the pytorch implementation onn chemprop. So I tried to parallelize it in multiple gpus and got the following error. How can I overcome it? How can I speed up the training? Now it takes under 4' for the chemprop implementation vs the 13:20' for 30 epochs. (The only architectural difference was thet there was lrscheduler in chemprop)
Epoch 1/30 Traceback (most recent call last): File "/gpfs1/schlecker/home/soulios/reproducing-graphs/molgraph/molgraph/train.py", line 128, in
qsar_model.fit(
File "/gpfs1/schlecker/home/soulios/miniforge3/envs/keras/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tmp/autograph_generated_filef80d5yys.py", line 15, in tftrainfunction
retval = ag.converted_call(ag__.ld(step_function), (ag.ld(self), ag__.ld(iterator)), None, fscope)
TypeError: in user code: