OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
118 stars 31 forks source link

Resource Manager: Unlink archives after extracting #1245

Closed kba closed 2 months ago

kba commented 3 months ago

to keep the /tmp directory manageable in size.

MehmedGIT commented 3 months ago

Related error:

15:16:58.718 INFO ocrd.resource_manager._download_impl - Downloading https://qurator-data.de/sbb_binarization/models.tar.gz to download.tar.xx
15:17:54.597 INFO ocrd.resource_manager.download - Extracting application/gzip archive to /tmp/tmpdah097uf/out
Traceback (most recent call last):
  File "/usr/local/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/ocrd/cli/resmgr.py", line 155, in download
    fpath = resmgr.download(
  File "/usr/local/lib/python3.8/site-packages/ocrd/resource_manager.py", line 331, in download
    tar.extractall()
  File "/usr/lib/python3.8/tarfile.py", line 2028, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
  File "/usr/lib/python3.8/tarfile.py", line 2069, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
  File "/usr/lib/python3.8/tarfile.py", line 2141, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/lib/python3.8/tarfile.py", line 2190, in makefile
    copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
  File "/usr/lib/python3.8/tarfile.py", line 250, in copyfileobj
    dst.write(buf)
OSError: [Errno 28] No space left on device
MehmedGIT commented 3 months ago

Everything works fine with the /tmp storage. The issue happens rather when all models are downloaded, then the container runs out of memory which leads to the error above. Not sure how or why that happens since I also volume map /usr/local/share where all models are downloaded.

bertsky commented 3 months ago

Not sure how or why that happens since I also volume map /usr/local/share where all models are downloaded.

Do you use bound volumes or named volumes? The latter takes up extra physical space in Docker daemon's data root (usually /var/lib/docker, unless overridden via data-root entry in /etc/docker/daemon.json).

MehmedGIT commented 3 months ago

It is rather a Singularity issue. I face that only inside the HPC. Never locally on my machine with Docker.

MehmedGIT commented 2 months ago

I am closing the issue since the error I get is unrelated to what we have previously thought. Nothing more to be done here.