SUPERCILEX / fuc

Modern, performance focused unix commands
Apache License 2.0
337 stars 8 forks source link

Sparse file handling #37

Open jbd opened 2 months ago

jbd commented 2 months ago

Hello,

Unless I'm mistaken, cpz does not handle sparse file and will copy null bytes over the wire. The xcp project (https://github.com/tarka/xcp) tries to iterate between sparse chunk and use copy_file_range for actual data.

I think this would be a nice addition and I hope you will consider this feature.

Sorry this is really an RFE without an associated pull request, but my rust skills are non-existent at the moment.

Thank you for this project. This is a fantastic contribution to the space of data movement.

Jean-Baptiste

SUPERCILEX commented 1 month ago

I looked into and it seems like a bit of a pain. You'd want to use this statx to determine if the file is sparse (by seeing if it's size on disk aka number of blocks is less than it's apparent size): https://github.com/SUPERCILEX/fuc/blob/2d61845b8f61f4332ef3b5a67842979bd578eeb4/fuc_engine/src/ops/copy.rs#L529

The to copy the file sparsely, I believe you can just use https://man7.org/linux/man-pages/man2/lseek.2.html alternating between seek_hole and seek_data to get the ranges that should be passed into copy_file_range. Then the end of the file might need an ftruncate to finish it off. Or maybe it's better to actually ftruncate the whole file at the beginning? Not sure.

jbd commented 1 month ago

Thank you for your answer !

I don't know about the ftruncate at the beginning. I think that https://github.com/hpc/mpifileutils is doing it in one of its pass (files and directories creation, data copy, setting permissions).

I agree that the sparse case looks quite tedious to implement:

https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libfs/src/linux.rs#L78 https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libxcp/src/operations.rs#L118 https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libxcp/src/operations.rs#L82 https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libfs/src/linux.rs#L206

Copying TB sparse files is not ideal too =)

In the end, this is a gentle request for enhancement, I just wanted to have your thoughts on this. I understand perfectly that you are more focused on interesting development like using io_uring.

Feel free to close this case !

Cheers.

Jean-Baptiste

SUPERCILEX commented 1 month ago

Copying TB sparse files is not ideal too =)

Lol, fair.

Feel free to close this case !

No no, I'm just saying that I probably won't implement this myself, but I'd be happy to accept a PR.