ersilia-os / olinda

Chemistry model distillation, based on 1024-dimensional embeddings
GNU General Public License v3.0
4 stars 1 forks source link

Final model training crash: Graph execution error #6

Open JHlozek opened 2 months ago

JHlozek commented 2 months ago

When initially getting the pipeline working, I struggled to resolve an issue that would crash Olinda with a Graph Execution Error where the tuner class trains the final model after identifying the best hyperparameters.

I narrowed down the cause to where a second model object is created in _final_train() in tuner.py. The temporary and only way I could get around this was to overwrite the same 'model' object in line 185 of tuner.py.

To reproduce the error: comment out line 185-189 and uncomment line 117 in tuner.py.

I'm not very familiar with the PyTorch framework, so I'm hoping, @leoank, you may have some insight here. One suggestion was to consider the files that are produced on disk as a potential source of the error.

JHlozek commented 2 months ago

Of course, now when I try to re-create what was a very persistent error, everything just works.

Perhaps it could have been a dependency version issue. I'll roll back and see if that re-creates the issue.

leoank commented 2 months ago

Let me know if you are still facing this. I can look into reproducing this.

GemmaTuron commented 2 weeks ago

Is this still an issue or can we close this @JHlozek ?