Large stdin (1MB) fails with system:11 Resource temporarily unavailable

stingray-11 commented 2 years ago

I'm attempting to do a reproc::run() with a reproc::input() with a large amount of data (~1MB). This always results in the process failing with system code 11 (Resource temporarily unavailable). With smaller amounts of data (a few KB) it works fine. I haven't found where the exact cut off is, but with 1MB it fails 100% of the time. This is on Linux.

DaanDeMeyer commented 2 years ago

Probably just doesn't fit into the pipe. We write all the data to the pipe before we even start the process. It's been a while, but I vaguely remember the reason for writing before the process start is because once the process starts, we don't want to do anything that could fail since that would mean we'd have to clean up the process in reproc_start() which is pretty involved and we'd like to leave it up to the user how to best do that.

We could add an option to increase the pipe size (doesn't necessarily need Windows support) so that you can write more stuff. There's probably an fcntl() option for that out there somewhere.

stingray-11 commented 2 years ago

Is there a way to stream into stdin after starting the process?

DaanDeMeyer commented 2 years ago

If you redirect to pipes in blocking mode, you can just call reproc_write with a large buffer and I think it'll work. The difference is that the pipe will be in blocking mode so the syscall will just block until it's able to write all the data to the pipe.

(We can't use a blocking write in setup_input() because the process hasn't started yet so if the data doesn't fit in the pipe the syscall would block forever because there's no one on the other side to pull data out of the pipe)

stingray-11 commented 2 years ago

I think I have it working now using process.write() as you mentioned. It seems odd that we have to call process.write() manually instead of having a reproc::drain-like wrapper for input but it seems to work. One question - is write() guaranteed to always write the entire buffer or should I be checking the returned # of bytes written and using a loop?

My code now looks roughly like this (ignoring error handling)

    //std::vector<std::string> arguments
    //const std::string input

    std::error_code error; reproc::process process;
    error = process.start(arguments);
    if (error) throw Exception(error);

    size_t written = 0; size_t size = input.size();

    std::tie(written,error) =
        process.write((const uint8_t*)(input.c_str()), size);

    error = process.close(reproc::stream::in);

    std::string output; reproc::sink::string sink(output);
    error = reproc::drain(process, sink, reproc::sink::null);

    int status = 0; std::tie(status,error) = process.wait(this->timeout);

    return output;

This library is much better then boost::process by the way. Much easier to use and works better. Thanks!

DaanDeMeyer commented 2 years ago

reproc_write() is more or less a call to the write() syscall underneath the covers so you can look at the man page for what to expect: https://man7.org/linux/man-pages/man2/write.2.html. More specifically, it mentions that potentially a signal could interupt the blocking write before all data is written. That seems like the most likely scenario that could happen when writing to a pipe.

I would merge a PR changing reproc_write to write again in case not all bytes were written (only if the pipe is in blocking mode of course). That way you can be sure either the entire input is written or an unrecoverable error occured.

DaanDeMeyer / reproc

Large stdin (1MB) fails with system:11 Resource temporarily unavailable #67