Open garlick opened 1 year ago
Based on a cursory look - we have a 4MB buffer in the broker and we treat filling it up as a fatal error. This probably wants end to end flow control but for now I wonder if we can handle the "buffer full" write error by backing off and retrying?
Update on this, I have been trying this approach again to ship files, For small files it usually completes successfully. However, with larger files (100+ MB), the first invocation will often fail, but an immediate retry will work.
The error is different in this case,
cat /tmp/cti-adangelo/cti_daemonrW8WUO1.tar | flux exec -r 0 sed -n 'w /tmp/flux-71aFeA/jobtmp-0-ƒ9HCP58r2B/cti_daemonrW8WUO1.tar'
May 11 14:32:40.908869 broker.err[0]: Error writing 65536 bytes to subprocess pid 17760 stdin
May 11 14:32:40.911301 broker.err[0]: Error writing 65536 bytes to subprocess pid 17760 stdin: unknown pid
May 11 14:32:40.913095 broker.err[0]: Error writing 65536 bytes to subprocess pid 17760 stdin: unknown pid
(Repeated)
Unrelated to the actual bug discussed in this issue, I'll note that @garlick developed a better method for shipping files via flux-filemap(1).
This is integrated into a stage-in
job shell option if that works in your use case. See the flux-shell(1) manpage for a description of the options.
Edit: though I didn't find any examples in the documentation of steps required to use to the stage-in plugin. We may want to add that. For now feel free to ask questions where things are not self-explanatory!
Edit2: There are some examples in the flux-filemap(1)
manpage, but they do not include use of the stage-in
job shell option.
We're using flux-filemap
to ship files from the broker node to the non-broker nodes, but we still needed a way to get the file from the frontend where we're running our debugger tools to the broker node.
Although currently, we only are supporting running our tools from inside the flux start
shell. Could we add files to the filemap directly in that case without worrying that the broker would be running somewhere else?
I've posted your question as the beginning of a Discussion thread here: #5168
I'm pretty confident flux-filemap(1)
will handle your use case, but since it isn't clear, we can use the discussion in the Q&A thread to perhaps improve documentation or add a FAQ. It might help to give more specifics of how you're trying to use flux-filemap over in that issue. Thanks!
Redirecting input via
flux exec
works in the shell, but when launched inside CTI, I'm getting the errorsI'm using Flux 0.40.0-15, it happens with
cat
,sed
, and a minimal C program that redirects input. Haven't seen this before in CTI when launching other programs, but it could be something with the input redirection.Originally posted by @ardangelo in https://github.com/flux-framework/flux-core/issues/3631#issuecomment-1248153865