Closed divineaugustine closed 5 months ago
Can you please run
clang-format
? Basically let's get past the superficial bits before we can dig into the threading.
Thank you for checking. I have run clang-format on my changes via vscode formatting. Is that what you meant? FYI: I did rebase and force push to fix DCO error.
Sorry for the inconvenience one header file was missed out from formatting. I have corrected that as well. Hopefully this solves all the formatting issues from my changeset.
I'll have a look at the code, but can you please rebase this and squash the fixups so the end result is a minimal set of independent changes. Say one for the liibrary changes, and one of the mkcomposefs use of it.
So, lots of comments. Make sure you read them all first, because some will make the other ones perhaps not needed. In particular, the propsed change to lcfs_node_set_from_content() will make most of the library API changes unnecessary.
So, lots of comments. Make sure you read them all first, because some will make the other ones perhaps not needed. In particular, the propsed change to lcfs_node_set_from_content() will make most of the library API changes unnecessary.
Thank you for the comments. I have addressed those and also squashed the commits to only 3 separate ones as you suggested. As i don´t know who should mark the comments as resolved, i just added reply to those.
Various nitpicks, but overall this looks good to me.
Thank you for reviewing it. I have addressed those and pushed the changes.
Some more minor feedback, but I would like @cgwalters or @giuseppe to also review this.
Also, if you're able to talk about it, I'd love to hear what you're using composefs for. Its good to know what your users are doing.
Also, if you're able to talk about it, I'd love to hear what you're using composefs for. Its good to know what your users are doing.
Can you give some details @https://github.com/r0l1 ? He is the right person to comment on this.
left a comment, could you please sign the commits with your real name?
My real name is Divin and i have signed off the commits with it. Did you mean including initials Divin OA or even initials expanded Divin Ookken Athappan?
My real name is Divin and i have signed off the commits with it. Did you mean including initials Divin OA or even initials expanded Divin Ookken Athappan?
yes please use the expanded version
Yes, separate PRs for things like this that are logically separate please.
Thank you for the links. Reading through https://lwn.net/Articles/846403/ i think it is safer to rollback
copy_file_range
to olderread/write
or perhaps address this as a separate PR as this does not have a direct dependency to the original multi-threading change. What do you say? I am very new to linux and i was looking for better apis to copy a file without user mode buffer.
Yes, separate PRs for things like this that are logically separate please.
I will create a separate PR for it. Can i also squash up the current commits in this PR to just 2 ? One for library and other for mkcomposefs. It was done before but then as part of review comments another set of 4 were added again.
Yes, I think one or two commits here is best.
I'm assuming your goal is faster mkcomposefs
overall (on a server side build system, right?). Using multiple threads is one way to do that. However, another way is caching - ostree does this thing where the build system can retain a cache of (device, inode number) :arrow_right: checksum; in theory we could offer the same.
The downside is of course that if something happens to mutate a file in the cache, you get incorrect checksums. But this is pretty easily mitigated, and actually a great thing with composefs is such a failure should be detected at runtime.
(These are complementary approaches to be clear; threading also makes sense)
Yes, I think one or two commits here is best.
Squashed the commits to just 2 (one for library and one for mkcomposefs) and siged it off with my full name. Added an issue #274 for copy_file_range
@alexlarsson we are using composefs for our custom OS and as docker container management layer. We have a small go initramfs including composefs and created a simple A/B OS which powers our devices at Wahtari and nLine. The devices are interlinked with a simple mesh like end-to-end encrypted network and this enables us to push updates (composefs store diff). I have some ideas how to improve the composefs store management and would like to discuss those ideas soon.
I really appreciate your work and follow your ideas since Glick2.
@cgwalters yes, this PR is for an internal speedup of the OS build process. Thanks for pointing out the cache idea.
@r0l1 That sounds cool! Nice that it works for you.
I'm assuming your goal is faster
mkcomposefs
overall (on a server side build system, right?). Using multiple threads is one way to do that. However, another way is caching - ostree does this thing where the build system can retain a cache of (device, inode number) ➡️ checksum; in theory we could offer the same.The downside is of course that if something happens to mutate a file in the cache, you get incorrect checksums. But this is pretty easily mitigated, and actually a great thing with composefs is such a failure should be detected at runtime.
(These are complementary approaches to be clear; threading also makes sense)
Thank you for these alternative-approaches. My objective was to make mkcomposefs run fast in any machine as part of our build process. It was simple and straight forward for me to add threads in mkcomposefs which does not require any facilitation from the end user. I am new to linux tooling and I would require some consultations here to fully understand these alternative approaches.
I'm assuming your goal is faster
mkcomposefs
overall (on a server side build system, right?). Using multiple threads is one way to do that. However, another way is caching - ostree does this thing where the build system can retain a cache of (device, inode number) ➡️ checksum; in theory we could offer the same.
This only works though if you image compose produces hardlinked copies of files you previously committed (otherwise the device/inode of the new files will not match something old in the cache). This happens regularly in ostree because of how it is set up, but that isn't necessarily always true, and it causes some level of extra pain to enforce when you build using ostree. (With things like rofiles-fuse, etc.)
Not saying it is bad, but its not always applicable.
But anyway, if you do use a setup like that where you end up with hardlinked files, then the best way to store and quickly access the cached digest is probably to just enable fs-verity on these files. In other words, we should probably have lcfs_node_set_from_content() start by asking the kernel for the fs-verity digest in case it is set, rather than computing it ourselves.
In other words, we should probably have lcfs_node_set_from_content() start by asking the kernel for the fs-verity digest in case it is set, rather than computing it ourselves.
Yes agreed, this is an obvious step.
I think this looks good enough now. I looked through it again and did some local tests, seems to work well, and the public library APIs are sane.
This resolves #249 and results in 10x improvement for digest calculation and file copy.
Please refer the below document for details about the issue and the change. I would like to know your thoughts. threading.md