Closed StevenLevine closed 1 year ago
This is an interesting case. So does this mean that TCP/IP installs its own exception handlers which return EFAULT instead of letting the trap go to a LIBC exception handler? If so, then not only recv()
needs this workaround but all the TCP/IP stack I would say.
If possible, please provide a test case and some more insight in case if you have it.
Yes, the EFAULT comes from an internal exception handler. The exception handler is in the ring0 code because that's where the buffer access will occur. In some ways, this is similar to the DosRead issue we ran into several years ago.
The libcx mmap implementation expects to get a chance to touch the pages and retry the operation, but since the exception is handled by ring0 code, this never happens.
While it is possible there might be other TCP/IP APIs that need a similar wrapper, we have not yet run into them. My recommendation is to fix the recv() issue and handle the others if/when they ever occur.
To see how we handled this in the ring3 code, look at where touch_pages is used in:
https://github.com/psmedley/php-os2/blob/main/main/streams/xp_socket.c
I'll see if I can cobble together a testcase from existing code.
Thanks
LIBC actually provides the touch (addr, length)
func (emxdll legacy) which is used in particular before DosRead/DosWrite
(expected) in a couple of places. I can easily do the same in the recv
wrapper but I really feel that recvfrom
, recvmsg
are also affected. Given that you can look into recv
internals, could you please also check which other apis need that? I will then do it for all of them at once. Somehow I don't think it makes sense to only do that for recv
. Thx!
If you want to patch recvfrom and recvmsg, that's fine with me. I was probably giving too much importance to the cost of the touch_pages calls. This cost is probably not measurable compared to the total cost of the APIs.
When you say "Given that you can look into recv internals," you are making some assumptions that are not quite true.
While I have some understanding of the recv internals, I do not have access the the IBM sources. Life would definitely be simplier if I did.
My knowledge of the tcp/ip stack comes from elsewhere. Some of it comes from spending quality time with the kernel debugger. Some of comes from disassembling some of the IBM drivers. Finally, some if it come from using older versions of FreeBSD tcpip sources as a reference. Much of the BSD code is very similar to the IBM code which makes sense given that the IBM stack is derived from the BSD stack.
For the recv failure, I never needed to find/debug the ring0 part of the recv code or its exception handler. The recv call returned errno 14 rather than trapping. In the fullness of time we realized that the buffer was uncommitted and why and we implemented our local fix.
Well, I meant the debugger part, not the source access of course. But anyway, okay, for now I will do the recv*
as well as send*
part (as it works with the buffers too) for now and then we'll see.
On the other hand, it's in the recv and send Linux API specs that they should return EFAULT (errno 14) if the "buffer pointer(s) point outside the process's address space". The FreeBSD specs second that. Interesting enough that the POSIX standard for these APIs doesn't have such a requirement.
So, If we touch buffer pages before calling and the buffer is not a mmap one, but simply some invalid address, the caller will get a trap instead of EFAULT. So it will break Linux and FreeBSD API specs but not POSIX. I'm still not sure what is the right way here.
An ideal solution would be to provide a special export from LIBCx that would only touch pages if the buffer is an mmap'ed one. Or better make LIBCx wrappers for recv and send and halide this situation there without introducing an explicit dependency of LIBCn on LIBCx, but it looks like too much work for now.
Yes, under the debugger one can see recv run and report the error, rather than trapping.
Do you you really mean send rather than recv? IMO, attempting to send an uncommitted buffer is a logic error that should trap.
No, I mean both recv and send — the Linux/BSD specs for them are very similar. For send* it says:
EFAULT An invalid user space address was specified for an
argument.
[EFAULT] An invalid user space address was specified for an ar-
gument.
And why would it be different? An invalid buffer (e.g. NULL) is a logic error for both recv and send.
It looks like the POSIX guys went a different way and didn't introduce EFAULT here (probably assuming that the app should trap in such a case — which is much better than hiding the user-level problem with EFAULT IMO). We can do this way as well (and it's the simplest solution here) but I wonder if it can make any compatibility issues. In theory, there might be some code relying on EFAULT.
I see. I replied before you next message arrived and had not yet seen the links you posted.
I am not sure which behavior is best for us. Most of what we port is from Linux, but how much of this code is written to POSIX or Linux standards is hard to say.
If I helps, the reason the recv problem showed up is because we were porting to php7 and php8 because php 5 was getting a bit too long in the tooth. The php7 and php8 memory management code is a total rewrite of what was in php5. In the php5 code the buffers were always committed. In php7/8, the buffers are left uncommitted. It appears that in Linux it is expected that the kernel will commit the buffers for ther recv call, as needed.
It also appears that the Qt6 memory management code is using commit/decommit as a way of optimizing physical memory usage. Paul ran into this when porting the Qt6 runtime. The Qt5 patches were not quite good enough to emulate the expected behavior.
All things being equal, I'd prefer and EFAULT to a trap. I am assuming that the code is checking for errors.
Yes, updates to comments here are not distributed in emails.
On the other hand, it's very unlikely that there is some app supplying a NULL buffer and then expecting EFAULT in this case. I can see this scenario in some intermediate library but still, NULL or random garbage (practically, it's the only way where touch won't help and EFAULT would have to be returned) is certainly a severe bug in the app and if the app crashes instead, it sounds fine for me.
So I suggest that I either go the POSIX way here, or postpone it for later (until I make smart wrappers in LIBCx that will differentiate between mmap and non-mmap regions).
I went with the touch
solution for recv/recvfrom/recvmsg
but then I just realised that libsocket
is a static library of LIBC wrappers around TCP/IP DLL calls (I don't know why Knut decided to go this way). This means that in order to use the new code the app needs to be rebuilt (relinked against a newer version of libsocket.lib
). I also realised that this hack is not needed for the send
part as when sending the buffer can't be actually non-committed memory (well, it technically can but this would be a clear application logic error). So I left this part out.
@StevenLevine, I will commit the fix now but I would like to ask you to build LIBC yourself to get the new libsocket.lib
and test your scenario with it. After that I will be ready to release a new RPM of LIBC.
Rebuilding libc now and will aim to rebuild PHP without the workaround once it's done
LGTM - https://smedley.id.au/tmp/php-7.4.32-os2-debug-20230705.zip works for me (built using the new libsocket.lib, and removing the touch_pages code we added)
Thanks @psmedley. I'm closing this then.
We ran into an application porting issue where an uncommitted buffer allocated by mmap was passed to recv() which caused the recv() to fail with EFAULT. We resolved the issue by touching the pages before calling libc's recv().
It might be helpful for libc's recv() to touch the pages before calling the stack's recv().