CosmiQ / solaris

CosmiQ Works Geospatial Machine Learning Analysis Toolkit
https://solaris.readthedocs.io
Apache License 2.0
413 stars 112 forks source link

[Error]: RuntimeError: CUDA error: no kernel image is available for execution on the device #419

Open garimaqs opened 3 years ago

garimaqs commented 3 years ago

Thank you for helping us improve solaris!

Summary of the bug

Gets a CUDA error when running xdxd_inferer from solaris_tutorials

Steps to reproduce the bug

Please either paste sample code used to generate the buggy behavior below, or provide step-by-step instructions to reproduce the problem.

xdxd_inferer = solaris.nets.infer.Inferer(config)
inf_df = solaris.nets.infer.get_infer_df(config)
start_time = time.time()
xdxd_inferer(inf_df)
end_time = time.time()

The code breakdown at the xdxd_inferer(inf_df) step with RuntimeError: CUDA error: no kernel image is available for execution on the device, the image that inference is being run on is from the solaris tutorial.

Buggy behavior and/or error message

Error traceback pasted below

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-ac434f9ac605> in <module>()
      5 print('dataset loaded. Running inference on the image.')
      6 start_time = time.time()
----> 7 xdxd_inferer(inf_df)
      8 end_time = time.time()
      9 print('running inference on one image took {} seconds'.format(end_time-start_time))

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/solaris/nets/infer.py in __call__(self, infer_df)
     99                             infer_df[i].iloc[idx].to(device))
    100 
--> 101                 subarr_preds = self.model(inf_input)
    102                 subarr_preds = subarr_preds.cpu().data.numpy()
    103             stitched_result = stitch_images(subarr_preds,

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/solaris/nets/zoo/xdxd_sn4.py in forward(self, x)
     40 
     41     def forward(self, x):
---> 42         conv1 = self.conv1(x)
     43         conv2 = self.conv2(self.pool(conv1))
     44         conv3 = self.conv3(self.pool(conv2))

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
    115     def forward(self, input):
    116         for module in self:
--> 117             input = module(input)
    118         return input
    119 

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    421 
    422     def forward(self, input: Tensor) -> Tensor:
--> 423         return self._conv_forward(input, self.weight)
    424 
    425 class Conv3d(_ConvNd):

/home/user_name/miniconda3/envs/solaris_cpu/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight)
    418                             _pair(0), self.dilation, self.groups)
    419         return F.conv2d(input, weight, self.bias, self.stride,
--> 420                         self.padding, self.dilation, self.groups)
    421 
    422     def forward(self, input: Tensor) -> Tensor:

RuntimeError: CUDA error: no kernel image is available for execution on the device

Expected behavior

To run inference on the image specified in inf_df

Environment information