choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
12 stars 4 forks source link

Training stuck in the current main branch #249

Closed MarshallYan closed 1 month ago

MarshallYan commented 2 months ago

When I try to train a model on MacOS (specifically TensorNet and ANI2x), it gets stucked when calculating pairlist for the dataset.

chrisiacovella commented 2 months ago

I think this is an issue with how multiprocessing is implemented in the PyTorch data loader on Mac OS. If we do not wrap the data loader in something like below, it will end up hanging.

if __name__ == "__main__":

However, implementing this, I see a possible bug about catting empty tensors (which seems to be causing problems on Mac OS not on linux).

wiederm commented 1 month ago

We are currently not supporting MacOs for training.