FZJ-INM1-BDA / celldetection

Scalable Instance Segmentation using PyTorch & PyTorch Lightning.
https://docs.celldetection.org
Apache License 2.0
125 stars 21 forks source link

"Cell Detection with Contour Proposal Networks" notebook crashes on forward pass #18

Closed codingS3b closed 6 months ago

codingS3b commented 6 months ago

I'm having trouble running through your demo notebook "Cell Detection with Contour Proposal Networks.ipynb". The kernel crashes when model training is started on the first forward pass. I narrowed it down to the execution of the forward function of the self.body inside the BackboneAsUNet object. I suspected GPU memory issues and lowered the batch size to 1 in the config and the crop_size to (128, 128) but without success.

I'm running on an Nvidia A100 (40GB), so I would really be surprised about a memory problem. Python 3.11.9 is used with torch 2.3.0+cu118 (installed via pip).

Would be happy to get any hints on the origin of the problem.

codingS3b commented 6 months ago

I think it might simply be an issue with my environment and cuda installation as the code works fine when running on cpu only. Therefore, I'm closing this.

ericup commented 6 months ago

Yes, this does sound like an environment issue. Overall, the notebook is just a small demo that should work on a 1080, so memory should certainly not be a concern.

In case you have trouble with local dependencies, you may also consider using conda and the binaries it provides. With that you don't need to rely on the local CUDA and cudnn installations. Otherwise our Docker (or Apptainer on HPC) containers may also help in this context, as everything is preinstalled.

Feel free to ask if you have any questions!