OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
48 stars 12 forks source link

CUDA out of memory / cannot disable CUDA #61

Closed bertsky closed 2 years ago

bertsky commented 4 years ago

On a CUDA-enabled system with more than 3GB of GPU memory currently free, I get this from dewarp:

INFO OcrdAnybaseocrDewarper - INPUT FILE 105_02_abbr
CustomDatasetDataLoader
dataset [AlignedDataset] was created
lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  "please use transforms.Resize instead.")
pix2pixHD/models/pix2pixHD_model.py:128: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_label = Variable(input_label, volatile=infer)
Traceback (most recent call last):
  File "bin/ocrd-anybaseocr-dewarp", line 8, in <module>
    sys.exit(ocrd_anybaseocr_dewarp())
  File "lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 32, in ocrd_anybaseocr_dewarp
    return ocrd_cli_wrap_processor(OcrdAnybaseocrDewarper, *args, **kwargs)
  File "lib/python3.6/site-packages/ocrd/decorators.py", line 82, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "lib/python3.6/site-packages/ocrd/processor/base.py", line 60, in run_processor
    processor.process()
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 130, in process
    self._process_segment(model, dataset, page, page_xywh, page_id, input_file, orig_img_size, n)
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 164, in _process_segment
    generated = model.inference(data['label'], data['inst'], data['image'])
  File "pix2pixHD/models/pix2pixHD_model.py", line 216, in inference
    fake_image = self.netG.forward(input_concat)
  File "pix2pixHD/models/networks.py", line 211, in forward
    return self.model(input)             
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "pix2pixHD/models/networks.py", line 252, in forward
    out = x + self.conv_block(x)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/padding.py", line 163, in forward
    return F.pad(input, self.padding, 'reflect')
  File "lib/python3.6/site-packages/torch/nn/functional.py", line 2865, in pad
    return torch._C._nn.reflection_pad2d(input, pad)
RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 3.93 GiB total capacity; 2.37 GiB already allocated; 18.94 MiB free; 35.58 MiB cached)

Frankly, this does not make any sense to me.

However, I thought, at least I should be able to disable GPU computation. The only parameter that can influence Pytorch setup in dewarp is gpu_id, which would need to be set to 'cpu'. But the tool JSON requires this to be a number!

    raise Exception("Invalid parameters %s" % report.errors)
Exception: Invalid parameters ["[gpu_id] 'cpu' is not of type 'number'"]
bertsky commented 4 years ago

Like so often (with this module), the problem runs deeper.

Even if you:

  1. allow -1 to represent non-GPU/CUDA, and pass that as the empty list to pix2pixHD, since its TestOptions().parse() gets called before gpu_ids is set, it will try to initialize CUDA
  2. translate the param into its respective sys.argv for pix2pix (i.e. '--gpu_ids' and str(parameter['gpu_ids'])), the inference code in pix2pix will try to use .cuda() everywhere

Thus, IMO there's no way to run the dewarper without GPU, or with a CUDA-enabled GPU with "only" 4GB RAM. :-1:

kba commented 4 years ago

Thanks for trying and detailling how it fails. I will refactor the tool to at least properly integrate pix2pixHD repo as a submodule, installed with the tool and take a look at the parameter handling.

Thus, IMO there's no way to run the dewarper without GPU, or with a CUDA-enabled GPU with "only" 4GB RAM.

I have no access to GPU at all, so I cannot test (unless I the cpu variant working) but at least these glaring shortcomings can be fixed.

kba commented 2 years ago

fixed by #89