Closed grondo closed 1 month ago
Was this a flux job info
lookup on stdin/stdout? I'm wondering if stdin/stdout is pretty big and somewhere along the way some 2G boundary is crossed.
This was likely flux job attach
since it was in the output of flux run
. According to the user, --output
and --error
were used but trying to verify that.
So I'm guessing this user produced a bunch of stdout into the KVS, and once it was > 2G, EOVERFLOW occurs b/c the data returned would be greater than the size of an int. Reproduced via
>flux submit t/shell/lptest 1000 5000000
ƒA6i9Muy
<wait some time, make that thing make a bunch of stdout>
>flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒA6i9Muy achu lptest R 1 1 5.095m corona212
> flux job attach ƒA6i9Muy
Sep 05 15:36:14.999036 PDT job-info.err[0]: info_lookup_continuation: flux_kvs_lookup_get: Value too large for defined data type
Sep 05 15:36:14.999190 PDT job-info.err[0]: main_namespace_lookup_continuation: flux_rpc_get_unpack: Value too large for defined data type
flux-job: flux_job_event_watch_get: Value too large for defined data type
I'm not really sure how to fix / improve this w/o major re-engineering (i.e. read KVS values w/ offsets / seeking values ... support > INT_MAX data ... may require RFC changes, don't know off the top of my head where we define things as ints only).
For the time being ...
Better error message? Return truncated output? (as an option?)
To my surprise we don't define things as int
in the KVS treeobj format. We just have specific API functions that use int
, such as treeobj_create_val()
. I suppose we technically could support larger stuffs by just updating everything to use size_t
(or unsigned int
) ... but I don't know if we want to go down that path.
Well it is a bit silly to be failing for that reason, although fetching a > 2G message in a KVS get may fail for other reasons. It's certainly going to be a self-DoS for a while.
just brainstorming here, we could support some type of flux_kvs_lookup_stream()
? return each blobref in the valref array one at a time?
Few brainstormings this morning:
kvs-watch
, an "offset" already is already being tracked, hypothetically could pass that into KVS lookup easily.kvs.lookup-plus
from within kvs-watch
, so pretty easy to modify protocol there.I wonder if it would be easier to implement a limit on storing a large value in the first place?
I hate to suggest modifying RFC 11 but we could maybe consider adding an optional size to val
and valref
?
(optional in the sense that it is allowed to be missing from older metadata)
If we have a running total size for a kvs value, then we have the means to reject an append that would cause it to exceed some maximum.
I hate to suggest modifying RFC 11 but we could maybe consider adding an optional size to val and valref? (optional in the sense that it is allowed to be missing from older metadata) If we have a running total size for a kvs value, then we have the means to reject an append that would cause it to exceed some maximum.
That seems like a good idea for a full on correct solution in the KVS.
But if we're going down the path of just limiting appends, for a quicker solution for this case, perhaps we can abuse the OUTPUT_LIMIT support in the shell to limit stdout/stderr? I see this in the shell's code
/* Set default to unlimited (0) for single-user instances,
* O/w use the default multiuser output limit:
*/
if (out->shell->broker_owner == getuid())
out->kvs_limit_string = "0";
else
out->kvs_limit_string = MULTIUSER_OUTPUT_LIMIT;
if we just change that "0" to "2G" (or perhaps less if we want to be conservative), I think the stdout/stderr will probably just be capped for single user instances.
Great point!
ok, lets go with that quick and dirty solution in the shell. The KVS thing you mention above is the longer term generalized solution, b/c users might be willing to abuse it if they write their own data to the KVS.
OK, I'll open an issue on the long term one. It requires more thought/discussion/spelunking I think.
We've had #5148 open since the original problem
A user is seeing this error from
flux run
in a subinstance with a job that may generate a lot of output. Not sure we've seen this one before.