containerbuildsystem / cachi2

GNU General Public License v3.0
7 stars 25 forks source link

Replace reflink dependency with an implementation using Python stdlib primitives #483

Closed eskultety closed 1 day ago

eskultety commented 5 months ago

With Yarn package manager support we introduced another dependency to the project to deal with faster copies of large artifacts - reflink (commit https://github.com/containerbuildsystem/cachi2/commit/2937416d596fda28cf3b0724239061885fe4e6d2). However, the library we used was created merely as a hobby attempt to solve this in Python for the time being since there hadn't been a native Python support for COW at the time. That project seems to have been abandoned since with zero activity but with a note that Python does already implement the functionality natively.

That said, while it is true that Python added means to achieve the same thing in the meantime via a new os syscall mapping os.copy_file_range, proper high-level primitives haven't been introduced to shutil yet. Compared to the copy_file_range syscall the reflink libfrary used an alternative low-level C implementation relying on ioctl combined with the FICLONE flag because back then the copy_file_range syscall wasn't considered stable or production ready. That has changed in the meantime and we should be able to come up with a pretty straightforward implementation based on copy_file_range for what we need until high-level support lands in the shutil module and ditch a dependency on a project that is an abandonware.

Implementation-wise the above could be simplified to the following pseudocode snippet:

# core/utils.py

def reflink_copy(src, dst, *):
    try:
      os.copy_file_range(src, dst, count_bytes)
    except OSError as e:
      if e.errno == errno.EXDEV or e.errno == errno.ENOSYS:
          raise Cachi2Error("reflinks not supported")
      raise from e

References:

eskultety commented 1 day ago

Resolved by: https://github.com/containerbuildsystem/cachi2/pull/578