Closed Allardvm closed 6 years ago
I thought this had gone away but apparently not.
./julia -p 1 -e "remotecall_fetch(()->zeros(2^28), 2)"
fails while
./julia -p 1 -e "remotecall_fetch(()->zeros(2^28-1), 2)"
succeeds.
const sz = 2^28
s=listen(10000)
@schedule while true
a = accept(s)
@schedule while true
data = read(a, Float64, sz)
end
end
c=connect("localhost", 10000)
write(c, zeros(sz))
This is a generic socket IO issue. sz
at 2^28-1 succeeds while 2^28 fails.
Thanks for the quick response! I looked into this a bit more and I believe I found why this, as you suggested, had previously gone away but is now rearing its ugly head again.
As part of the write(c, zeros(sz))
call in the generic version you posted above, the program eventually calls uv_write(s::AsyncStream, p, n::Integer)
at stream.jl:929 (in Julia Version 0.4.0). There, line 933
err = ccall(:jl_uv_write, Int32, (Ptr{Void}, Ptr{Void}, UInt, Ptr{Void}, Ptr{Void}), s, p, n, uvw, uv_jl_writecb_task::Ptr{Void})
Results in the following error when sz >= 2^28
:
ERROR (unhandled task failure): AssertionError: d != C_NULL
in uv_writecb_task at /Applications/Julia.app/Contents/Resources/julia/lib/julia/sys.dylib
in jlcapi_uv_writecb_task_19223 at /Applications/Julia.app/Contents/Resources/julia/lib/julia/sys.dylib
in process_events at /Applications/Julia.app/Contents/Resources/julia/lib/julia/sys.dylib
in wait at /Applications/Julia.app/Contents/Resources/julia/lib/julia/sys.dylib (repeats 2 times)
in stream_wait at stream.jl:60
in accept at stream.jl:1061
in accept at socket.jl:338
[inlined code] from In[1]:5
in anonymous at task.jl:63
This does not crash the program--stream_wait(ct)
on line 944 does. However, bb744fd14f01223961b0d2c74229a0ce001872e5, got rid of the try/finally
statement that used to surround the above code and also changed the call
err = ccall(:jl_uv_write, Int32, (Ptr{Void}, Ptr{Void}, UInt, Ptr{Void}, Ptr{Void}), handle(s), p, n, uvw, uv_jl_writecb_task::Ptr{Void})
into the current
err = ccall(:jl_uv_write, Int32, (Ptr{Void}, Ptr{Void}, UInt, Ptr{Void}, Ptr{Void}), s, p, n, uvw, uv_jl_writecb_task::Ptr{Void})
I presume these changes cause the current error, but since I do not know the reasoning behind these changes I am unsure how to proceed from here.
A good thing would be to confirm if https://github.com/JuliaLang/julia/commit/bb744fd14f01223961b0d2c74229a0ce001872e5 is indeed the cause. Could you test with one commit before https://github.com/JuliaLang/julia/commit/bb744fd14f01223961b0d2c74229a0ce001872e5 and also with https://github.com/JuliaLang/julia/commit/bb744fd14f01223961b0d2c74229a0ce001872e5 itself ?
It looks like bb744fd14f01223961b0d2c74229a0ce001872e5 may not be the cause after all. I did a git checkout e254aa84a69935495a84e07eaebf0aab61e85b33
to go back to e254aa84a69935495a84e07eaebf0aab61e85b33, which--as far as I can see--is the commit right before 37ade410e79e9fda3f2d08ebc63b7254a62793be was merged into master
(this is the commit into master
that corresponds to bb744fd14f01223961b0d2c74229a0ce001872e5, which was committed to the release-0.4
branch).
Using the resultant binary
Julia Version 0.5.0-dev+133
Commit e254aa8* (2015-09-14 18:09 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin15.0.0)
CPU: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
to run your generic working example
const sz = 2^28
s = listen(10000)
@schedule while true
a = accept(s)
@schedule while true
data = read(a, Float64, sz)
end
end
c = connect("localhost", 10000)
write(c, zeros(sz))
still results in
ERROR: write: invalid argument (EINVAL)
in yieldto at ./task.jl:71
in wait at ./task.jl:367
in stream_wait at ./stream.jl:293
in uv_write at stream.jl:952
in buffer_or_write at stream.jl:965
in write at stream.jl:1005
The same error occurs when using the v0.3.11
Julia binary.
So far I have only used the Julia binaries, so I am new to building Julia from source. If you would like me to test this a bit differently, please let me know.
The discussion in these two closed issues might be of use as they seem related: #9497, #6567. The test code in #6567 still results in an EINVAL
error when called with at least 1 worker present.
This is the problem:
len
in uv_buf_t
is an unsigned int
See https://github.com/libuv/libuv/blob/ef47e8b2127c45ee58cae6b6f8ace5d6fde695b2/include/uv.h#L434
and our usage here: https://github.com/JuliaLang/julia/blob/706600408aba8b142c47c2bc887bde0d9bf774cf/src/jl_uv.c#L356
We will need to provide more than one buffer to libuv. Will submit a patch.
Nope that was not it. The issue is still one of a unsigned int overflow somewhere. Trying to track it down.
libuv seems to be using system call writev
for the actual socket write. And I found this line in the OSX writev
man page that may explain this error:
[EINVAL] The sum of the iov_len values in the iov array overflows a 32-bit integer.
from https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/write.2.html
cc @vtjnash
Looks like that may indeed cause the OSX issue, considering that your generic working example runs perfectly on Windows.
However, there is still a related issue in here somewhere since my original working example (below) doesn't run correctly on Windows either. The code produces the EINVAL
error reported above on OSX, but immediately triggers a BSOD on Windows (10). I just lost my debugging efforts when I tried to run the code. This also prevents me from providing a decent backtrace.
addprocs(1)
@everywhere function testthis()
databuffer = zeros(300000000)
return databuffer
end
function wrapper()
for i_wpid in workers()
result = remotecall_fetch(i_wpid, testthis)
end
end
wrapper()
I'm all but certain that the EINVAL
error on OSX is related to the writev
exception your mentioned. I have manually traced your generic working example to the following call in libuv/src/unix/stream.c:
n = write(uv__stream_fd(stream), iov[0].iov_base, iov[0].iov_len);
Here, iov[0].iov_len
contains the value 2147483648
, which indeed overflows a signed 32 bit int. I guess this is some sort of documented bug in OSX, since the Open Group only calls for an EINVAL
when it would overflow an ssize_t
, which is usually based on an unsigned* int.
I see Go also ran into this issue: https://github.com/golang/go/issues/7812 and fixed it by implementing a cap on writes at 2 << 30 - 1
.
*Correction: ssize_t
should be the size of a signed int. This makes OSX's more expected, although I would expect the {SSIZE_MAX}
to be based on a 64 bit int on a 64 bit machine, which it doesn't seem to be. The behavior of write
(not writev
) when iov[0].iov_len
> {SSIZE_MAX}
is left undefined by POSIX, although I haven't found documentation on OSX's implementation yet.
Can't we just emulate large writes by splitting them into batches smaller than 2^31-1? That seems friendlier than Go's approach of disallowing really large writes.
Yeah, that's what Amit suggested as well and is how other languages have implemented it. I think the main thing to decide on is where to implement this. Should it be in libuv, or should we do this in Julia? There is some talk about that in the pull request above, but I'm also still trying to figure out how this can be best implemented in a way that works everywhere, rather than just on OS X.
@Allardvm:
*Correction: ssize_t should be the size of a signed int. This makes OSX's more expected, although I would expect the {SSIZE_MAX} to be based on a 64 bit int on a 64 bit machine, which it doesn't seem to be.
size_t
is only guaranteed to be able to hold the size of the largest size object (it is unsigned).
ssize_t
is a signed version of size_t
.
It is not the size of a int, it can be larger (or even smaller).
System programming models these days are generally 32-bit ILP32, 64-bit LP64 (Linux/Unix),
or 64-bit LLP64 (Windows-64). On LP64 and LLP64 systems, size_t
is 64-bits.
I don't think Julia needs to worry about segmented memory systems, where size_t
might be
16 or 32-bits, but pointers were 32 or 48-bits respectively.
(those systems are why uintptr_t
and intptr_t
should be used instead of size_t
/ssize_t
to hold
pointers cast to integers).
This really seems like documented "feature" on OS X, where a signed int (32-bit) is used to sum up the
different sizes, instead of a ssize_t (64-bit on 64-bit systems).
Linux / AIX / Solaris all allow up to SSIZE_MAX
.
Emulating large writes by splitting them up is not correct either, because the write should be atomic.
You'd need to malloc a buffer big enough for the total size, copy the sections to the buffer, do a single write
, then free the buffer.
write isn't atomic anyways, so that is irrelevant (http://linux.die.net/man/3/write)
If you follow my cross ref, you will see write exhibits the same behaviors as writev.
From http://man7.org/linux/man-pages/man2/readv.2.html:
The data transfers performed by readv() and writev() are atomic: the data written by writev() is written as a single block that is not intermingled with output from writes in other processes (but see pipe(7) for an exception); analogously, readv() is guaranteed to read a contiguous block of data from the file, regardless of read operations performed in other threads or processes that have file descriptors referring to the same open file description (see open(2)).
A write
may be interrupted, and you may be able to see a torn write
if the system crashes,
but they are supposed to be atomic in the sense that multiple processes output don't get intermingled.
As it notes above, there are some exceptions, such as the one with pipes, where atomicity isn't guaranteed if the size is greater than the size of PIPE_BUF
@vtjnash The top of the reference you showed says that it isn't even necessarily valid for any Linux implementation:
This manual page is part of the POSIX Programmer's Manual. The Linux implementation of this interface may differ (consult the corresponding Linux manual page for details of Linux behavior), or the interface may not be implemented on Linux.
Even so, it implicity does make the guarantee, with the following exception:
Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.
That documentation also makes it clear that if you have opened the file with the O_APPEND
flag,
the seek and the write are atomic with respect to other writers:
If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
If write
didn't make any guarantees of atomicity, it would make many programming tasks very difficult, imagine if you always had to grab a lock on a device just to ensure that your output operations didn't get interleaved with those of other processes?
So, to sum this up, it is important that this not be done by doing multiple writes. It is much better to just get the error, instead of losing the logical integrity of your data.
(although maybe another write and writev call could be added, for people who are certain that other processes can't be attempting to write to the file at the same time).
Unless you are arguing that 2GB is < 4096 or that we should be using the local filesyst as the network interface, I suggest you should start grabbing those locks. (and no, I don't find it hard to imagine a world where resource contention requires locks. That's parallel programming 101)
This is not a discussion about O_APPEND semantics. The two operations are not related, other than the use of the same term for different purposes
No, what I'm saying is that would break things for large writes, depending on the type of device being written to. Grabbing locks when it is not required is a performance killer - maybe that wasn't taught in your Parallel Programming 101.
The libuv
API that Julia uses is broken, although it says that uv_write
/uv_read
are equivalent to pwritev
/preadv
, they return a 32-bit int
, not a 32/64 bit ssize_t
, and the uv_buf_init
function takes an unsigned int
instead of a size_t
.
@amitmurthy according to the libuv documentation, uv_buf.len actually is size_t
or ULONG
on Windows, which means that on Windows, it is always limited to 32-bits, but elsewhere it is being limited to 32-bits by the uv_buf_init
constructor.
@amitmurthy I just noticed what looks like bugs in the Julia jl_uv_
code.
The fields base
and len
are documented as being Readonly
in the libuv
documentation,
and uv_buf_t
structures are supposed to be created via uv_buf_init
, which returns a uv_buf_t
by value, however all of the Julia wrapper code simply sets buf[0].base
/ buf[0].len
directly.
That function was added for FFI convenience since the order of the fields is OS-dependent. It's unnecessary when used from C.
https://github.com/JuliaLang/libuv/pull/33 fixed this for UNIX. Presumably still an issue on Windows.
Yes, this is still an issue on Windows. Running the code below results in an immediate BSOD on Windows 7 in the current release (v0.4.3
) and the most recent nightly (v0.5.0-dev+3116
).
addprocs(1)
@everywhere function foo()
databuffer = zeros(300000000)
return databuffer
end
function bar()
for i_wpid in workers()
result = remotecall_fetch(i_wpid, foo)
end
end
foo()
The issue seems a bit different from the UNIX one, since the working example that triggered an EINVAL
error due to large reads/writes on UNIX (before the fix in https://github.com/JuliaLang/libuv/pull/33), does not trigger an error on Windows:
const sz = 2^28
s = listen(10000)
@schedule while true
a = accept(s)
@schedule while true
data = read(a, Float64, sz)
end
end
c = connect("localhost", 10000)
write(c, zeros(sz))
I encounter an error when using distributed processing as part of a simulation study. Following the
pmap
code, I use one master core to hand out tasks to a set of worker cores. Each worker processes a task and then returns some data in an array that the master writes to a file. Even with a single worker in addition to the master, I encounter theEINVAL
error below when the return array is quite large (~ 2.4 GB). The array fits in the available memory (the machine has 16 GB, with over 9 GB available at the time of the error).The following minimal working example reproduces the error on my Mac, but results in a slightly different error and backtrace on a similar Windows computer:
The error does not occur when I omit the
addprocs(1)
in the above code.Below my
versioninfo
, including memory information:My attempts to fix this have not been successful so far. Any help is greatly appreciated.