flux-job: flux_kvs_lookup_get: Value too large for defined data type

grondo commented 1 month ago

A user is seeing this error from flux run in a subinstance with a job that may generate a lot of output. Not sure we've seen this one before.

job-info.err[0]: info_lookup_continuation: flux_kvs_lookup_get: Value too large for defined data type
ob-info.err[0]: main_namespace_lookup_continuation: flux_rpc_get_unpack: Value too large for defined data type
flux-job: flux_job_event_watch_get: Value too large for defined data type

chu11 commented 1 month ago

Was this a flux job info lookup on stdin/stdout? I'm wondering if stdin/stdout is pretty big and somewhere along the way some 2G boundary is crossed.

grondo commented 1 month ago

This was likely flux job attach since it was in the output of flux run. According to the user, --output and --error were used but trying to verify that.

chu11 commented 1 month ago

So I'm guessing this user produced a bunch of stdout into the KVS, and once it was > 2G, EOVERFLOW occurs b/c the data returned would be greater than the size of an int. Reproduced via

>flux submit t/shell/lptest 1000 5000000
ƒA6i9Muy

<wait some time, make that thing make a bunch of stdout>

>flux jobs
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
    ƒA6i9Muy achu     lptest      R      1      1   5.095m corona212

> flux job attach ƒA6i9Muy
Sep 05 15:36:14.999036 PDT job-info.err[0]: info_lookup_continuation: flux_kvs_lookup_get: Value too large for defined data type
Sep 05 15:36:14.999190 PDT job-info.err[0]: main_namespace_lookup_continuation: flux_rpc_get_unpack: Value too large for defined data type
flux-job: flux_job_event_watch_get: Value too large for defined data type

I'm not really sure how to fix / improve this w/o major re-engineering (i.e. read KVS values w/ offsets / seeking values ... support > INT_MAX data ... may require RFC changes, don't know off the top of my head where we define things as ints only).

For the time being ...

Better error message? Return truncated output? (as an option?)

chu11 commented 1 month ago

To my surprise we don't define things as int in the KVS treeobj format. We just have specific API functions that use int, such as treeobj_create_val(). I suppose we technically could support larger stuffs by just updating everything to use size_t (or unsigned int) ... but I don't know if we want to go down that path.

garlick commented 1 month ago

Well it is a bit silly to be failing for that reason, although fetching a > 2G message in a KVS get may fail for other reasons. It's certainly going to be a self-DoS for a while.

chu11 commented 1 month ago

just brainstorming here, we could support some type of flux_kvs_lookup_stream()? return each blobref in the valref array one at a time?

chu11 commented 1 month ago

Few brainstormings this morning:

It just occurred to me that b/c this is going through kvs-watch, an "offset" already is already being tracked, hypothetically could pass that into KVS lookup easily.
for appended eventlogs, nothing stops us from only returning less than 2G, and some flag that says "truncated" along w/ above so kvs-watch knows to ask for more
we are already using the special kvs.lookup-plus from within kvs-watch, so pretty easy to modify protocol there.

garlick commented 1 month ago

I wonder if it would be easier to implement a limit on storing a large value in the first place?

I hate to suggest modifying RFC 11 but we could maybe consider adding an optional size to val and valref? (optional in the sense that it is allowed to be missing from older metadata)

If we have a running total size for a kvs value, then we have the means to reject an append that would cause it to exceed some maximum.

chu11 commented 1 month ago

I hate to suggest modifying RFC 11 but we could maybe consider adding an optional size to val and valref? (optional in the sense that it is allowed to be missing from older metadata) If we have a running total size for a kvs value, then we have the means to reject an append that would cause it to exceed some maximum.

That seems like a good idea for a full on correct solution in the KVS.

But if we're going down the path of just limiting appends, for a quicker solution for this case, perhaps we can abuse the OUTPUT_LIMIT support in the shell to limit stdout/stderr? I see this in the shell's code

    /*  Set default to unlimited (0) for single-user instances,
     *  O/w use the default multiuser output limit:
     */
    if (out->shell->broker_owner == getuid())
        out->kvs_limit_string = "0";
    else
        out->kvs_limit_string = MULTIUSER_OUTPUT_LIMIT;

if we just change that "0" to "2G" (or perhaps less if we want to be conservative), I think the stdout/stderr will probably just be capped for single user instances.

garlick commented 1 month ago

Great point!

chu11 commented 1 month ago

ok, lets go with that quick and dirty solution in the shell. The KVS thing you mention above is the longer term generalized solution, b/c users might be willing to abuse it if they write their own data to the KVS.

garlick commented 1 month ago

OK, I'll open an issue on the long term one. It requires more thought/discussion/spelunking I think.

grondo commented 1 month ago

We've had #5148 open since the original problem

flux-framework / flux-core

flux-job: flux_kvs_lookup_get: Value too large for defined data type #6256