Open turbosonics opened 1 week ago
No, it's because the suffix .xyz
is hardwired in your version of the code and you're using .extxyz
. This has been fixed in #462 . @ilyes319 , what's the status of that PR?
No, it's because the suffix
.xyz
is hardwired in your version of the code and you're using.extxyz
. This has been fixed in #462 . @ilyes319 , what's the status of that PR?
Ha, my habit to distinguish two xyz formats brought this error.
I changed the geometry file name to *.xyz and resubmitted. Now the training job started, the crash doesn't occur.
Thank you.
Describe the bug I compiled MACE to local GPU server cluster using virtual environment with python39, cuda 11.8 & pytorch 2.3.0.
My input script looks like this:
Then, the training job fails almost immediately. Slurm system generates following error file:
Also, log file prints out as following:
Is this happening because the size of the training set is huge?