Azure / DotNetty

DotNetty project – a port of netty, event-driven asynchronous network application framework
Other
4.09k stars 977 forks source link

Libuv TcpChannle Write Buffer Error in Linux #348

Closed caozhiyuan closed 6 years ago

caozhiyuan commented 6 years ago

use strace -f -e writev,write,sendmsg,send,sendto -o log.txt dotnet Rpc.Server.dll client send data to server code in https://github.com/caozhiyuan/DotNetty/tree/libuv/examples Rpc.Server Rpc.Client When writev get EAGAIN (Resource temporarily unavailable), the buffer after it will be all wrong. log.txt 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\321\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":719,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\322\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":720,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\323\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":721,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\324\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":722,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\325\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":723,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\326\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":724,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\330\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":726,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\327\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":725,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\331\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":727,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\345\0#", 6}, {"\0\0\2\332\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":728,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 131 25773 write(114, "34580.com/\\",\\"ImageSiteUrl\\":\\""..., 102) = -1 EAGAIN (Resource temporarily unavailable) 25786 write(100, "\1\0\0\0\0\0\0\0", 8) = 8 25786 write(100, "\1\0\0\0\0\0\0\0", 8) = 8 25786 write(100, "\1\0\0\0\0\0\0\0", 8) = 8 25775 --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=25762, si_uid=1000} --- 25773 --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=25762, si_uid=1000} --- 25773 write(114, "34580.com/\\",\\"ImageSiteUrl\\":\\""..., 102) = 102 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233 25773 writev(114, [{"\0\0\0\346\0#", 6}, {"\0\0\24\34\2\0\30Rpc.Models.Test.CityInfo\0"..., 35}, {"{\"Id\":999,\"Name\":\"{\\"Id\\":1,\\"Ci"..., 192}], 3) = 233

caozhiyuan commented 6 years ago

@StormHub look it , after EAGAIN Id all same log.txt open it with notepad++

caozhiyuan commented 6 years ago

may be input.RemoveBytes(writtenBytes) should after write callback ?

caozhiyuan commented 6 years ago

https://github.com/caozhiyuan/DotNetty/blob/libuv/src/DotNetty.Transport.Libuv/TcpChannel.cs#L173 change code to uv_try_write it work ok on linux @StormHub

StormHub commented 6 years ago

uv_try_write is not a very good idea, it bypasses the libuv write queue and attempts to write immediately. If there is an error with EAGAIN, the error should be reported back from libuv in write callback and further bubble to the channel exception caught.

caozhiyuan commented 6 years ago

if queuesize=0 use uv_try_write otherwise uv_try_write return -EAGAIN and use uv_write and RemoveBytes after it callback . now direct input.RemoveBytes(writtenBytes) after uv_write not it callback ,when scoket EAGAIN client get wrong buffer in linux

StormHub commented 6 years ago

Do you have a reliable way to reproduce this?

caozhiyuan commented 6 years ago

@StormHub sorry for delay. if direct RemoveBytes , when buffer reused, bytes in buffer will be changed , if uv_buffer not write complete(when socket Resource temporarily unavailable), after get socket Resource and write bytes(had been changed) will raise client socket receive buffer parse error. in windows and linux all can reproduce bug

caozhiyuan commented 6 years ago

@StormHub see https://github.com/caozhiyuan/DotNetty/tree/libuvbug/examples Echo.Server and Echo.Client . run it on windows

StormHub commented 6 years ago

@caozhiyuan I still cannot see the issue, would you please just share the server/client part so that I can reproduce it?

caozhiyuan commented 6 years ago

@StormHub this branch( https://github.com/caozhiyuan/DotNetty/tree/libuvbug/) is pull from you git repro, and Echo.Server and Echo.Client is minimal isolated repro,why not try it?

StormHub commented 6 years ago

Ok, I will try it out thanks

StormHub commented 6 years ago

Finally, I figured out what happened, will fix it.

StormHub commented 6 years ago

@caozhiyuan The fix is in my branch https://github.com/StormHub/DotNetty/tree/Work Could you please try it out?

caozhiyuan commented 6 years ago

@StormHub sorry for delayed 2 days https://github.com/caozhiyuan/DotNetty/tree/HttpTest Echo.Server not work well Echo.Client get error buffer HttpServer work well this http benchmarking tool https://github.com/codesenberg/bombardier is good use it test , libuv HttpServer better than socket. and Close to Java HttpServer

StormHub commented 6 years ago

@caozhiyuan There was a buffer release issue previously. It is fixed now, could you please try it on linux again?

StormHub commented 6 years ago

@caozhiyuan Fix is in PR, please let me know if any other issues.

caozhiyuan commented 6 years ago

@StormHub linux libuv better than socket a lit .

StormHub commented 6 years ago

@caozhiyuan The PR has been merged, pending buffer writes should not be reused anymore.

caozhiyuan commented 6 years ago

@StormHub https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Transport.Libuv/TcpChannel.cs#L162 https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Transport.Libuv/NativeChannel.cs#L311 when write fail , app can't kown it fail . because it be driect removed , when wirte faill input?.FailFlushed(exception, true) excute what?

StormHub commented 6 years ago

@caozhiyuan The intention of that line is to clear the outbound buffer if there are failures. Libuv transport writes actually behave similar to socket based transport. In socket transport the write operation takes the SocketAsyncEventArgs to carry out the write operations. All write tasks will be completed if those writes successfully received by the operation systems. But the actual OS level write might fail and the error will be reported to channel. The only difference is that the byte buffer can be released by socket transport whereas Libuv has to wait until write callback is invoked.