emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.6k stars 3.28k forks source link

doWritev has a race condition #22258

Closed jonahgraham closed 1 month ago

jonahgraham commented 1 month ago

Please include the following in your bug report:

Version of emscripten/emsdk:

Verified in current HEAD of main branch, the issue has been present since initial implementation in https://github.com/emscripten-core/emscripten/commit/fca780db9168bbd10b9a9724914a8a345eebabea

Failing command line in full:

I don't have a command line that fails as my implementation of stream_ops.write exposes the issue, but I hope that I can describe the issue sufficiently here. I intend to provide a PR for this soon.

Consider a custom device for writing registered as a device as per documentation, roughly:

const dev = FS.makedev(32, 0);
FS.registerDevice(dev, io);
FS.mkdev(path, mode, dev);

where io has has a write method that has a 20-byte buffer.

And a C program that writes something like this device:

FILE *f = fopen(.. registered device ..);
fputs("abcdefghijklmnopqrstuvwxyz", f)
fputs("\n", f);
fflush(f);

This C code will eventually call __stdio_write at which point iovs[2] will contain two entries, one for the abc... and length 26 and second entry \n and length 1.

__stdio_write calls syscall _fd_write and ends up in doWritev.

doWritev iterates through the iovs structure writing the data for each entry. However, the bug is that if a write does not write all data, the loop continues. If the output buffer that write stores to gets (partially) emptied between the two calls to write the \n from the second call can end up in the output buffer. This is the race condition which is why in my code I only occasionally see this. Finally, since doWritev will return 21 in this case, the __stdio_write will retry the write syscall and the data written ends up being "abcdefghijklmnopqrst\nvwxyz\n"

Here is the problematic code:

https://github.com/emscripten-core/emscripten/blob/cc5083ebb39660468bf3d712dfbbbe0aa1b738d8/src/library_wasi.js#L223-L227

What is needed is to abort the loop if a write does not write len bytes:

      var curr = FS.write(stream, HEAP8, ptr, len, offset);
      if (curr < 0) return -1;
      ret += curr;
      if (curr < len) break; // no more space to write
      if (typeof offset != 'undefined') {

Note that the read equivalent doReadv does have this early exit for the loop:

https://github.com/emscripten-core/emscripten/blob/cc5083ebb39660468bf3d712dfbbbe0aa1b738d8/src/library_wasi.js#L209

sbc100 commented 1 month ago

I guess none of current FS backends end up doing partial writes so this went unnoticed.

Feel free to send a PR.

(I'm not sure I would call this a race condition since it doesn't involve any threads, but it certainly looks like a bug).

jonahgraham commented 1 month ago

The partial write by itself isn't problematic, it is the first write being a partial write, combined with the second write accepting bytes. The reason I called it a race condition is that it requires the write buffer to empty slightly between the two calls to write. Normally this can't happen because there is no time for background events to happen. In my case I have another WebWorker processing that write buffer. So the race is between the wasm code and the other web worker. The race isn't between two wasm threads.

PR coming soon (its my first, so figuring stuff out still)

sbc100 commented 1 month ago

The partial write by itself isn't problematic, it is the first write being a partial write, combined with the second write accepting bytes. The reason I called it a race condition is that it requires the write buffer to empty slightly between the two calls to write. Normally this can't happen because there is no time for background events to happen. In my case I have another WebWorker processing that write buffer. So the race is between the wasm code and the other web worker. The race isn't between two wasm threads.

PR coming soon (its my first, so figuring stuff out still)

How of interest, how is your WebWorker racing with the emscripten main thread? Is it using a SharedArrayBuffer?

jonahgraham commented 1 month ago

How of interest, how is your WebWorker racing with the emscripten main thread? Is it using a SharedArrayBuffer?

Yes. Basically as a FIFO between the two workers.

jonahgraham commented 1 month ago

I contributed a fix and associated test case in #22261.

My additional learning while writing the test case: