deephealthproject / use-case-pipelines

Use case pipelines based on EDDL and ECVL libraries. Different tasks (e.g. classification and segmentation) and datasets (e.g. MNIST, ISIC, and PNEUMOTHORAX) are taken into account.
Other
2 stars 5 forks source link

SLURM multi-GPU environment with CUDNN #20

Closed lauracanalini closed 2 years ago

lauracanalini commented 3 years ago

The pipeline in a SLURM environment with two GPUs and EDDL compiled with CUDNN, crashes every time with a different error, which is always one of these:

EDDL: develop branch (commit 99aa78f9bd30eb6a8fcb27fee38528175b899f5d) ECVL: 0.4.1 CUDA: 11.0 CUDNN: 8.0.4

In a local machine everything works fine. We don't think it's related to the latest ECVL release because it occurs also with the LoadBatch function.

lauracanalini commented 2 years ago

It seems it was a problem with an imported onnx. From 56cfe97680b6e1f0430ec32fbb67880612e6a710 we changed the onnx and the problem did not appear again.