flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
168 stars 50 forks source link

kvs-watch: only fetch new data for appends #6444

Open chu11 opened 1 week ago

chu11 commented 1 week ago

Problem: The FLUX_KVS_WATCH_APPEND flag is implemented inefficiently in kvs-watch. Everytime new data is appended, the entire contents of the watched key are retrieved and only new data calculated via an offset is sent to the watcher. This significantly slows down performance of large appended data (such as stdio).

Solution: Instead of retrieving the entire contents of the watched key, fetch the tree object for the key from the KVS. With access to the tree object's blobref array, we need only access the new appended blobrefs from the content store. This significantly cuts down on the amount of data transfered during a kvs-watch.

Fixes #6414


side notes:

So RPC wise the pros are:

the cons are

For large data, this is a significant performance improvement for use cases like this:

time flux run flux lptest 2000000 80

wallclock time is generally being cut down from 58 seconds to 24 seconds. It's a big win. (FWIW, this was on corona. On my laptop it was 15s before 11.7s afterwards, not as much as a win there, but still that's 22%)

For everything else (flux job attach JOBID for a finished job, job throughput), it appears that performance is a wash. That the wins balance out the losses. This is of course with limited testing in single node instances.

Edit: Also as an aside, there is more potential for support of #6292 with this implementation

chu11 commented 6 days ago

re-pushed fixing up a lot of the minor things listed above. I went with the general rule of making more descriptive errors when an error would be half-likely, but kept "function name: strerror" log messages for things that should extremely unlikely.

chu11 commented 3 days ago

re-pushed with fixes per comments above. Also added a few new cleanups. I could probably split those out at this point in time.

codecov[bot] commented 2 days ago

Codecov Report

Attention: Patch coverage is 73.04965% with 38 lines in your changes missing coverage. Please review.

Project coverage is 83.60%. Comparing base (8015f88) to head (91c30de).

Files with missing lines Patch % Lines
src/modules/kvs-watch/kvs-watch.c 72.66% 38 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6444 +/- ## ========================================== - Coverage 83.61% 83.60% -0.02% ========================================== Files 523 523 Lines 87547 87656 +109 ========================================== + Hits 73199 73281 +82 - Misses 14348 14375 +27 ``` | [Files with missing lines](https://app.codecov.io/gh/flux-framework/flux-core/pull/6444?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework) | Coverage Δ | | |---|---|---| | [src/cmd/job/eventlog.c](https://app.codecov.io/gh/flux-framework/flux-core/pull/6444?src=pr&el=tree&filepath=src%2Fcmd%2Fjob%2Feventlog.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework#diff-c3JjL2NtZC9qb2IvZXZlbnRsb2cuYw==) | `89.41% <100.00%> (+0.11%)` | :arrow_up: | | [src/modules/kvs-watch/kvs-watch.c](https://app.codecov.io/gh/flux-framework/flux-core/pull/6444?src=pr&el=tree&filepath=src%2Fmodules%2Fkvs-watch%2Fkvs-watch.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework#diff-c3JjL21vZHVsZXMva3ZzLXdhdGNoL2t2cy13YXRjaC5j) | `78.20% <72.66%> (+0.05%)` | :arrow_up: | ... and [10 files with indirect coverage changes](https://app.codecov.io/gh/flux-framework/flux-core/pull/6444/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework)

🚨 Try these New Features: