The recv() wrapper does not handle memory allocated by mmap transparently.

StevenLevine commented 2 years ago

We ran into an application porting issue where an uncommitted buffer allocated by mmap was passed to recv() which caused the recv() to fail with EFAULT. We resolved the issue by touching the pages before calling libc's recv().

It might be helpful for libc's recv() to touch the pages before calling the stack's recv().

dmik commented 1 year ago

This is an interesting case. So does this mean that TCP/IP installs its own exception handlers which return EFAULT instead of letting the trap go to a LIBC exception handler? If so, then not only recv() needs this workaround but all the TCP/IP stack I would say.

If possible, please provide a test case and some more insight in case if you have it.

StevenLevine commented 1 year ago

Yes, the EFAULT comes from an internal exception handler. The exception handler is in the ring0 code because that's where the buffer access will occur. In some ways, this is similar to the DosRead issue we ran into several years ago.

The libcx mmap implementation expects to get a chance to touch the pages and retry the operation, but since the exception is handled by ring0 code, this never happens.

While it is possible there might be other TCP/IP APIs that need a similar wrapper, we have not yet run into them. My recommendation is to fix the recv() issue and handle the others if/when they ever occur.

To see how we handled this in the ring3 code, look at where touch_pages is used in:

https://github.com/psmedley/php-os2/blob/main/main/streams/xp_socket.c

I'll see if I can cobble together a testcase from existing code.

Thanks

dmik commented 1 year ago

LIBC actually provides the touch (addr, length) func (emxdll legacy) which is used in particular before DosRead/DosWrite (expected) in a couple of places. I can easily do the same in the recv wrapper but I really feel that recvfrom, recvmsg are also affected. Given that you can look into recv internals, could you please also check which other apis need that? I will then do it for all of them at once. Somehow I don't think it makes sense to only do that for recv. Thx!

StevenLevine commented 1 year ago

If you want to patch recvfrom and recvmsg, that's fine with me. I was probably giving too much importance to the cost of the touch_pages calls. This cost is probably not measurable compared to the total cost of the APIs.

When you say "Given that you can look into recv internals," you are making some assumptions that are not quite true.

While I have some understanding of the recv internals, I do not have access the the IBM sources. Life would definitely be simplier if I did.

My knowledge of the tcp/ip stack comes from elsewhere. Some of it comes from spending quality time with the kernel debugger. Some of comes from disassembling some of the IBM drivers. Finally, some if it come from using older versions of FreeBSD tcpip sources as a reference. Much of the BSD code is very similar to the IBM code which makes sense given that the IBM stack is derived from the BSD stack.

For the recv failure, I never needed to find/debug the ring0 part of the recv code or its exception handler. The recv call returned errno 14 rather than trapping. In the fullness of time we realized that the buffer was uncommitted and why and we implemented our local fix.

dmik commented 1 year ago

Well, I meant the debugger part, not the source access of course. But anyway, okay, for now I will do the recv* as well as send* part (as it works with the buffers too) for now and then we'll see.

dmik commented 1 year ago

On the other hand, it's in the recv and send Linux API specs that they should return EFAULT (errno 14) if the "buffer pointer(s) point outside the process's address space". The FreeBSD specs second that. Interesting enough that the POSIX standard for these APIs doesn't have such a requirement.

So, If we touch buffer pages before calling and the buffer is not a mmap one, but simply some invalid address, the caller will get a trap instead of EFAULT. So it will break Linux and FreeBSD API specs but not POSIX. I'm still not sure what is the right way here.

An ideal solution would be to provide a special export from LIBCx that would only touch pages if the buffer is an mmap'ed one. Or better make LIBCx wrappers for recv and send and halide this situation there without introducing an explicit dependency of LIBCn on LIBCx, but it looks like too much work for now.

StevenLevine commented 1 year ago

Yes, under the debugger one can see recv run and report the error, rather than trapping.

Do you you really mean send rather than recv? IMO, attempting to send an uncommitted buffer is a logic error that should trap.

dmik commented 1 year ago

No, I mean both recv and send — the Linux/BSD specs for them are very similar. For send* it says:

Linux:

   EFAULT An invalid user space address was specified for an
          argument.

FreeBSD:

[EFAULT]        An invalid user space address was specified for an ar-
        gument.

And why would it be different? An invalid buffer (e.g. NULL) is a logic error for both recv and send.

It looks like the POSIX guys went a different way and didn't introduce EFAULT here (probably assuming that the app should trap in such a case — which is much better than hiding the user-level problem with EFAULT IMO). We can do this way as well (and it's the simplest solution here) but I wonder if it can make any compatibility issues. In theory, there might be some code relying on EFAULT.

StevenLevine commented 1 year ago

I see. I replied before you next message arrived and had not yet seen the links you posted.

I am not sure which behavior is best for us. Most of what we port is from Linux, but how much of this code is written to POSIX or Linux standards is hard to say.

If I helps, the reason the recv problem showed up is because we were porting to php7 and php8 because php 5 was getting a bit too long in the tooth. The php7 and php8 memory management code is a total rewrite of what was in php5. In the php5 code the buffers were always committed. In php7/8, the buffers are left uncommitted. It appears that in Linux it is expected that the kernel will commit the buffers for ther recv call, as needed.

It also appears that the Qt6 memory management code is using commit/decommit as a way of optimizing physical memory usage. Paul ran into this when porting the Qt6 runtime. The Qt5 patches were not quite good enough to emulate the expected behavior.

All things being equal, I'd prefer and EFAULT to a trap. I am assuming that the code is checking for errors.

dmik commented 1 year ago

Yes, updates to comments here are not distributed in emails.

On the other hand, it's very unlikely that there is some app supplying a NULL buffer and then expecting EFAULT in this case. I can see this scenario in some intermediate library but still, NULL or random garbage (practically, it's the only way where touch won't help and EFAULT would have to be returned) is certainly a severe bug in the app and if the app crashes instead, it sounds fine for me.

So I suggest that I either go the POSIX way here, or postpone it for later (until I make smart wrappers in LIBCx that will differentiate between mmap and non-mmap regions).

dmik commented 1 year ago

I went with the touch solution for recv/recvfrom/recvmsg but then I just realised that libsocket is a static library of LIBC wrappers around TCP/IP DLL calls (I don't know why Knut decided to go this way). This means that in order to use the new code the app needs to be rebuilt (relinked against a newer version of libsocket.lib). I also realised that this hack is not needed for the send part as when sending the buffer can't be actually non-committed memory (well, it technically can but this would be a clear application logic error). So I left this part out.

@StevenLevine, I will commit the fix now but I would like to ask you to build LIBC yourself to get the new libsocket.lib and test your scenario with it. After that I will be ready to release a new RPM of LIBC.

psmedley commented 1 year ago

Rebuilding libc now and will aim to rebuild PHP without the workaround once it's done

psmedley commented 1 year ago

LGTM - https://smedley.id.au/tmp/php-7.4.32-os2-debug-20230705.zip works for me (built using the new libsocket.lib, and removing the touch_pages code we added)

dmik commented 1 year ago

Thanks @psmedley. I'm closing this then.

bitwiseworks / libc

The recv() wrapper does not handle memory allocated by mmap transparently. #123