elephant-track / elephant-server

A server implementation of ELEPHANT
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

Prediction error #7

Closed JoOkuma closed 3 years ago

JoOkuma commented 3 years ago

Hi, I was trying to play with the demo data and I'm getting the following error when doing the prediction.

I tried executing it with the Versatile and Default weights.

Oct 05, 2021 2:46:22 PM org.elephant.actions.RabbitMQService lambda$openConnection$7
INFO: batch_size: 1
crop_size: [16, 256, 256]
dataset_name: dataset
debug: False
device: cuda
is_3d: True
keep_axials: (True, True, True, False)
log_interval: 10
model_path: /workspace/models/detection.pth
n_crops: 5
n_models: 1
output_prediction: False
patch_size: None
scales: [1, 0.09000000357627869, 0.09000000357627869]
timepoint: None
url: None
use_2d: False
zpath_input: /workspace/datasets/dataset/imgs.zarr
Oct 05, 2021 2:46:46 PM org.elephant.actions.RabbitMQService lambda$openConnection$7
INFO: batch_size: 1
c_ratio: 0.4
crop_box: None
crop_size: (16, 384, 384)
dataset_name: dataset
debug: False
device: cuda
is_3d: True
is_pad: True
keep_axials: (True, True, True, False)
log_interval: 10
model_path: /workspace/models/detection.pth
output_prediction: False
p_thresh: 0.5
patch_size: [24, 384, 384]
r_max: 5
r_min: 1
scales: [1, 0.09000000357627869, 0.09000000357627869]
timepoint: 0
use_2d: False
use_median: False
zpath_input: /workspace/datasets/dataset/imgs.zarr
zpath_seg_output: /workspace/datasets/dataset/seg_outputs.zarr
Oct 05, 2021 2:55:51 PM org.elephant.actions.RabbitMQService lambda$openConnection$7
INFO: processing 1 / 8
Oct 05, 2021 2:57:18 PM org.elephant.actions.RabbitMQService lambda$openConnection$7
SEVERE: Failed in detect_spots
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/elephant/common.py", line 447, in detect_spots
    config.crop_box)
  File "/opt/conda/lib/python3.7/site-packages/elephant/common.py", line 395, in _get_seg_prediction
    for model in models], axis=0)
  File "/opt/conda/lib/python3.7/site-packages/elephant/common.py", line 395, in <listcomp>
    for model in models], axis=0)
  File "/opt/conda/lib/python3.7/site-packages/elephant/common.py", line 351, in predict
    return _patch_predict(model, x, keep_axials, patch_size, func)
  File "/opt/conda/lib/python3.7/site-packages/elephant/common.py", line 340, in _patch_predict
    model(x[slices], keep_axials)[0].detach().cpu().numpy()
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/elephant/models.py", line 189, in forward
    x = self.encoder[level](x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 480, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
Oct 05, 2021 2:57:18 PM org.elephant.actions.RabbitMQService lambda$openConnection$7
SEVERE: Failed in detect_spots
Traceback (most recent call last):
  File "./main.py", line 745, in predict_seg
    spots = detect_spots(config, redis_client)
  File "/opt/conda/lib/python3.7/site-packages/elephant/common.py", line 456, in detect_spots
    prediction[-1],  # last channel is the center label
UnboundLocalError: local variable 'prediction' referenced before assignment

Do you know what could be causing this?

ksugar commented 3 years ago

Hi @JoOkuma, thank you for reporting this issue. I would like to confirm how did you prepare the server environment. Google Colab, Docker or Singularity? If you have chosen Docker or Singularity, it is possible that your GPU is not compatible with PyTorch used in ELEPHANT, or it has insufficient amount of memory.

JoOkuma commented 3 years ago

Hi, thanks for the response.

You nailed the problem, the cuda version was not compatible with my GPU, after updating the docker container it worked. The software is awesome.