Closed Peregalli closed 1 year ago
very interesting. looking at the code, the default batch sizes are essentially the same as to what the training process uses (which is more memory intensive), so its not clear why one would work and the other wouldn't
lowering the batch size would seem like the only way to go. have you tried reducing it down to 1? are you doing that in the appropriate sections? each process has its own parameter space:
did you try restarting the flask server and directly running the two commands you mentioned without any training taking place in between?
Thanks for your answer. After I send the issue, I change the batch size to 1 and it works! Anyway, It seems really strange that memory usage supports train process but not inference.
super strange, but glad it worked!
I already train a base model with some annotations, but when I try to generate predictions to correct inference the following error appears:
ERROR: generate_superpixel (job 158) failed
andERROR: generate_prediction (job 157) failed
. Checking in .logs error seems to be :ERROR: Traceback (most recent call last): File "make_output_unet_cmd.py", line 147, in <module> output_batch = tta_model(arr_out_gpu) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.8/dist-packages/ttach/wrappers.py", line 39, in forward augmented_output = self.model(augmented_image, *args) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/opt/QuickAnnotator/unet.py", line 64, in forward x = up(x, blocks[-i-1]) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/opt/QuickAnnotator/unet.py", line 119, in forward out = self.conv_block(out) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/opt/QuickAnnotator/unet.py", line 89, in forward out = self.block(x) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 119, in forward input = module(input) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 399, in forward return self._conv_forward(input, self.weight, self.bias) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 395, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 3.82 GiB total capacity; 1.33 GiB already allocated; 3.25 MiB free; 1.34 GiB reserved in total by PyTorch)
Checking my GPU usage I see that is not fully used up: But GPU was used when I trained the mode :I already try to reduce batch size but seems not to have any impact. Im using Ubuntu 22.04 and cuda 11.0