Closed oliverw closed 7 years ago
One thing you might want to try out. https://github.com/coinfoundry/miningcore/blob/master/src/MiningCore/JsonRpc/JsonRpcConnection.cs#L110 When the stream is ended (connection dropped), the completed will be called. You might want to send the a shutdown in the completed handler.
Those are observations from my personal experiences working with libuv. I only use Async handle if I want to schedule something within libuv thread, e.g. send some data immediately.
@StormHub Thanks a lot for the infos Johnny. I'm going to implement the changes and see how it goes.
Please let me know how this goes. BTW does this happen on windows platform?
@StormHub What's not entirely clear to me from your remarks is how the server forcefully terminates the client's connection. The OnCompleted and OnError handlers are called when the connection has ended but how do you initiate this in a safe way - avoiding the crashes we are talking about? The use case here is getting rid of a misbehaving client as quickly and safely as possible.
Sorry for the confusion. What you can try first is to put the close handle into OnCompleted, it is always called when a stream ends.
@StormHub Ok and leave the rest as it is?
Do you actively close server connection? If you do, just use Async to trigger a Shutdown, then later on the completed will be called.
BTW: if you share some gist to reprod the issue, I definitely can help you much more effectively :)
@StormHub That's the problem it only happens on production systems under load. It just happened again with the fix you recommended (moving CloseHandle to the OnCompleted handler):
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `dotnet MiningCore.dll -c /config.json'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007ff09bdea9f1 in uv__write (stream=0x7fefac01f880)
at /datadrive/teamcity/6a56927d0d6f3630/submodules/libuv/src/unix/stream.c:867
867 /datadrive/teamcity/6a56927d0d6f3630/submodules/libuv/src/unix/stream.c: No such file or directory.
[Current thread is 1 (Thread 0x7fefc6ffd700 (LWP 55))]
From the stack trace, it looks like libuv attempts to write on some invalid request handle memory. Is this the server side or client side? or both running in the same loop?
Any prior error/exception on the handle?
The whole issue is the server side. The clients are not under my control and don't use LibUv at all.
Nope no managed exceptions (I would included them for sure). Just the segfault out of nowhere:
Segmentation fault (core dumped)
Could you show me the part where you send the write requests?
I'm pretty sure the crash is triggered either here or in the same file Line 121.
The only way how writes are enter the picture are the two Send
methods in the same source file. Line 141 and 156.
I noticed that it is a straight byte[] writes, it might be worth experiment to see how that goes with the pooled buffer. For example: WritableBuffer buffer = tcp.Allocate(1024); buffer.WriteString("Somestring", Encoding.UTF8); tcp.QueueWriteStream(buffer, (h, e)=> { / Do some error checking/ buffer.Dispose(); } );
Net core Unsafe has some issues in linux with buffer copy.
The pooled buffer will automatically expand, you just have to give it a reasonable size to begin.
I noticed that it is a straight byte[] writes, it might be worth experiment to see how that goes with the pooled buffer.
Allright. I'm going to implement this. By the way I've also experimented with a work-around, but for some reason writesInFlight
often did not reach a value of zero in Line 213.
Net core Unsafe has some issues in linux with buffer copy.
Interesting. Source?
It is a way to check whether it is a buffer memory problem or handle problem. BTW: Where is the send gets called? is it within libuv thread? (it should be called libuv loop thread)
For some Unsafe copy problem, see this commends (already reported to MS) https://github.com/StormHub/NetUV/blob/dev/src/NetUV.Core/Common/PlatformDependent.cs#L33 I am waiting for some sort of explanation at the moment.
@StormHub You mean the two Send
methods of JsonRpcConnection? Those are called from any TaskPool thread. Judging by your comment this is just plain wrong, is it? (I was under the impression the queuing logic behind your QueueWrite implementation would make sure that the actual writes happen on the Loop thread. Have I been wrong?
Well, you have to use Async handle to marshal the send from within the libuv thread. Initially that was my intention to do this automatically but some of my fellow programmers complain I do not stick to libuv principles, it got removed. e.g. make the wrapper layer as 'thin' as possible, but not less.
@StormHub Does the WritableBuffer also need to be allocated on the Loop thread?
I'm asking because this implementation works, while this one starts throwing exceptions (see below) as soon as Send
is called for the second time. It almost looks like as if there's a problem with the buffer pooling (the recylced buffer is somehow not correctly reset):
[2017-10-06 11:21:33.6887] [E] [xmr1] System.IndexOutOfRangeException: index: 16384, length: 238 (expected: range(0, 238)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.CheckIndex(Int32 index, Int32 fieldLength)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.CheckSrcIndex(Int32 index, Int32 length, Int32 srcIndex, Int32 srcCapacity)
at NetUV.Core.Buffers.PooledArrayBuffer`1.Set(Int32 index, T[] src, Int32 srcIndex, Int32 length)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.Set(Int32 index, T[] src)
at NetUV.Core.Buffers.ByteBufferExtensions.SetString(IArrayBuffer`1 buffer, Int32 start, String value, Encoding encoding)
at NetUV.Core.Buffers.WritableBuffer.WriteString(String value, Encoding encoding)
at MiningCore.JsonRpc.JsonRpcConnection.Send(Object payload) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\JsonRpc\JsonRpcConnection.cs:line 150
at MiningCore.Stratum.StratumClient`1.Notify[T](JsonRpcRequest`1 request) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumClient.cs:line 97
at MiningCore.Stratum.StratumClient`1.Notify[T](String method, T payload) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumClient.cs:line 88
at MiningCore.Blockchain.Monero.MoneroPool.<OnNewJob>b__9_0(StratumClient`1 client) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Blockchain\Monero\MoneroPool.cs:line 279
at MiningCore.Stratum.StratumServer`1.ForEachClient(Action`1 action) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumServer.cs:line 250 System.IndexOutOfRangeException: index: 16384, length: 238 (expected: range(0, 238)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.CheckIndex(Int32 index, Int32 fieldLength)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.CheckSrcIndex(Int32 index, Int32 length, Int32 srcIndex, Int32 srcCapacity)
at NetUV.Core.Buffers.PooledArrayBuffer`1.Set(Int32 index, T[] src, Int32 srcIndex, Int32 length)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.Set(Int32 index, T[] src)
at NetUV.Core.Buffers.ByteBufferExtensions.SetString(IArrayBuffer`1 buffer, Int32 start, String value, Encoding encoding)
at NetUV.Core.Buffers.WritableBuffer.WriteString(String value, Encoding encoding)
at MiningCore.JsonRpc.JsonRpcConnection.Send(Object payload) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\JsonRpc\JsonRpcConnection.cs:line 150
at MiningCore.Stratum.StratumClient`1.Notify[T](JsonRpcRequest`1 request) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumClient.cs:line 97
at MiningCore.Stratum.StratumClient`1.Notify[T](String method, T payload) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumClient.cs:line 88
at MiningCore.Blockchain.Monero.MoneroPool.<OnNewJob>b__9_0(StratumClient`1 client) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Blockchain\Monero\MoneroPool.cs:line 279
at MiningCore.Stratum.StratumServer`1.ForEachClient(Action`1 action) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumServer.cs:line 250 at NetUV.Core.Buffers.AbstractArrayBuffer`1.CheckIndex(Int32 index, Int32 fieldLength)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.CheckSrcIndex(Int32 index, Int32 length, Int32 srcIndex, Int32 srcCapacity)
at NetUV.Core.Buffers.PooledArrayBuffer`1.Set(Int32 index, T[] src, Int32 srcIndex, Int32 length)
at NetUV.Core.Buffers.AbstractArrayBuffer`1.Set(Int32 index, T[] src)
at NetUV.Core.Buffers.ByteBufferExtensions.SetString(IArrayBuffer`1 buffer, Int32 start, String value, Encoding encoding)
at NetUV.Core.Buffers.WritableBuffer.WriteString(String value, Encoding encoding)
at MiningCore.JsonRpc.JsonRpcConnection.Send(Object payload) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\JsonRpc\JsonRpcConnection.cs:line 150
at MiningCore.Stratum.StratumClient`1.Notify[T](JsonRpcRequest`1 request) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumClient.cs:line 97
at MiningCore.Stratum.StratumClient`1.Notify[T](String method, T payload) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumClient.cs:line 88
at MiningCore.Blockchain.Monero.MoneroPool.<OnNewJob>b__9_0(StratumClient`1 client) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Blockchain\Monero\MoneroPool.cs:line 279
at MiningCore.Stratum.StratumServer`1.ForEachClient(Action`1 action) in C:\Users\oweichhold\Projects\MiningCore\src\MiningCore\Stratum\StratumServer.cs:line 250
Nope, the buffer pool is managed separately. Not part of libuv. The new buffer pool is in dev at the moment.
I might just want push the new release out from dev.
Could you just use byte[] for now to resolve the issue. Give me a bit of time to tidy up the new buffer pool and release it. It was something I had been working on for quite some time. It is the last missing piece before ssl.
@StormHub Allright. Thanks for the heads up and all your help.
@oliverw Thanks for letting me know. I found the issue, it was the wrong byte offset. But that does not matter much now because I have re written all of those. :)
@StormHub Hehe. Just let me know when the new buffer stuff is production ready.
@StormHub Totally weird. Even after the fix the app keeps crashing and crashing. It actually got worse. Now crashing every couple minutes. Users leaving the pool.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `dotnet MiningCore.dll -c /config.json'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f4970000cc0 in ?? ()
[Current thread is 1 (Thread 0x7f4977fff700 (LWP 58))]
(gdb) bt
#0 0x00007f4970000cc0 in ?? ()
#1 0x00007f4a65dd5307 in uv__async_event (loop=0x7f4970000b20,
w=0x7f4970000ce0, nevents=2)
at /datadrive/teamcity/6a56927d0d6f3630/submodules/libuv/src/unix/async.c:98
#2 0x00007f4a65dd584d in uv__async_io (loop=0x7f4970000b20, w=0x7f4970000ce8,
events=1)
at /datadrive/teamcity/6a56927d0d6f3630/submodules/libuv/src/unix/async.c:138
#3 0x00007f4a65dee806 in uv__io_poll (loop=0x7f4970000b20, timeout=-1)
at /datadrive/teamcity/6a56927d0d6f3630/submodules/libuv/src/unix/linux-core.c:382
#4 0x00007f4a65dd5fcf in uv_run (loop=0x7f4970000b20, mode=UV_RUN_DEFAULT)
at /datadrive/teamcity/6a56927d0d6f3630/submodules/libuv/src/unix/core.c:352
#5 0x00007f4a09722bda in ?? ()
#6 0x4040000000000000 in ?? ()
#7 0x0000000063420588 in ?? ()
#8 0x00007f4a809b9488 in vtable for InlinedCallFrame ()
from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.0.0/libcoreclr.so
#9 0x00007f4977ffec58 in ?? ()
#10 0x00007f4a099590c8 in ?? ()
#11 0x00007f4a099562e0 in ?? ()
uv__async_event is probably the result of https://github.com/coinfoundry/miningcore/blob/master/src/MiningCore/JsonRpc/JsonRpcConnection.cs#L168
by the way, which libuv package are you using? is it the one built from AspNet?
From the stack, it is clearly not a buffer issue because https://github.com/libuv/libuv/blob/v0.10/src/unix/async.c#L99 uses the local buffer to receive from io.
Meanwhile, this is actually within libuv io poll, not even passed along to managed code yet.
@StormHub Good morning Johnny. Sorry for responding late. I simply fell asleep after deploying an emergency fix involving a rewrite of all socket comms using plain old TcpListener just to get this thing running stable over night and getting some sleep after battling this the whole day. 😄
by the way, which libuv package are you using? is it the one built from AspNet?
https://github.com/coinfoundry/miningcore/blob/master/src/MiningCore/MiningCore.csproj#L44
I'm using the one that satisfies NetUVs NuGet dependency.
Sorry, I mean the libuv dependency?
@StormHub I don't have a direct dependency on LibUv. I use what NetUVs references: https://github.com/StormHub/NetUV/blob/dev/src/NetUV.Core/NetUV.Core.csproj#L28
Right. I am really interested in what happened on linux. Give me a bit time and let me push out the new version with some sort of automatic libuv loop thread management. I cannot change handle interfaces, too many references to those. I have to offer something on top of libuv wrapper. https://github.com/StormHub/NetUV/blob/dev/src/NetUV.Core/Channels/EventLoop.cs And let you try it out.
@StormHub Looks interesting.
Basically freeing user code from those marshalling code. That was originally in the stream handle. I pulled all out.
@StormHub By the way, you wrote:
Keep reusing the same Async handle
I'm not sure how this would work here. I need a closure on the data
parameter as there does not seem to be a way to pass state to the Async Handler through AsyncHandle.Send(). Did I miss something?
How about use some wrapper task to hold the stream handle and data to send, set it to the Async handle UserToken. Now I probably see the problem here, the data lambda capture might be disposed on managed code.
@StormHub You mean the crash?
If you want the data to be held until sent, you have to get the handle to hold it. In managed code, delegate might be GC'd if you don't anything to hold it on the managed runtime.
Yep, it only happens on intensive load, GC triggers much more often.
Plus, you probably should us ServerGC on prod as well. That would make the GC behave a bit better under intensive load.
I'm getting repeated reports of Segfaults on Linux which all seem to be related to situations where the server actively terminates client connections.
Inspecting the core dump using gdb revealed this crash location: https://github.com/libuv/libuv/blob/v0.10/src/unix/stream.c#L867
Due to optimization being turned on it might be the
assert
three lines below . Another bug report points to uv__write_callbacks.Flow of events
Dispose
on Line #174 invokes the actual connection shutdown