The reason for this effort is that the reflink library [1] was created
as an attempt to make use of the reflink optimization before python
gained support for the os.copy_file_range syscall. The library was
never really anything more than a band-aid and now that it's possible
to use a syscall the library even mentions on its GitHub page that
Python now implements the functionality natively.
The implementation was taken from (with some 3.9+ tweaks applied) from
an existing code proposal [2] (marked as "awaiting merge") to add the
same functionality to the 'shutil' library copying primitives and make
it completely transparent to end users. We'd have to wait a long time
to be able to use it though. Compared to the reflink library, which
used a dirty trick of copying a small file first
(in kinda error-prone way) to see if the operation raised an exception,
os.copy_file_range based solution succeeds in vast majority of
cases because if reflinks are not supported within the underlying
file system (which is nothing more than inode sharing) a copy without
the overhead of userspace <-> kernel can still continue normally, hence
reserving the 'shutil.copy2' fallback to really obscure cases
(like cross-device copying - EXDEV OR on old systems without the
syscall - ENOSYS) or simply cases where the copying failed for some
reason which we may not even encounter ever.
[ ] Code coverage from testing does not decrease and new code is covered
[ ] Docs updated (if applicable)
[ ] Docs links in the code are still valid (if docs were updated)
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
The reason for this effort is that the reflink library [1] was created as an attempt to make use of the reflink optimization before python gained support for the os.copy_file_range syscall. The library was never really anything more than a band-aid and now that it's possible to use a syscall the library even mentions on its GitHub page that Python now implements the functionality natively.
The implementation was taken from (with some 3.9+ tweaks applied) from an existing code proposal [2] (marked as "awaiting merge") to add the same functionality to the 'shutil' library copying primitives and make it completely transparent to end users. We'd have to wait a long time to be able to use it though. Compared to the reflink library, which used a dirty trick of copying a small file first (in kinda error-prone way) to see if the operation raised an exception, os.copy_file_range based solution succeeds in vast majority of cases because if reflinks are not supported within the underlying file system (which is nothing more than inode sharing) a copy without the overhead of userspace <-> kernel can still continue normally, hence reserving the 'shutil.copy2' fallback to really obscure cases (like cross-device copying - EXDEV OR on old systems without the syscall - ENOSYS) or simply cases where the copying failed for some reason which we may not even encounter ever.
[1] https://gitlab.com/rubdos/pyreflink [2] https://github.com/python/cpython/pull/93152/files
Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)