OpenSmalltalk / opensmalltalk-vm

Cross-platform virtual machine for Squeak, Pharo, Cuis, and Newspeak.
http://opensmalltalk.org/
Other
563 stars 111 forks source link

sbrk() sqUnixSpurMemory patch #665

Closed cstes closed 8 months ago

cstes commented 1 year ago

Hi,

I have sent in the past email to Eliot Miranda about an issue which I discovered in August 2021 (two years ago), and I thought to raise an issue here.

It is a Linux/UNIX patch issue with the behavior of the mmap() system call or library call as used by the OpenSmalltalk COG vm.

When way back in 2021 some changes were made for Linux ARM8 to the file saUnixSpurMemory.c I reported that this was breaking the Solaris 11.3 build. Curiously the Solaris 11.4 build was still working fine.

I think (I don't know) that perhaps the behavior of mmap() on Solaris 11.4 is slightly diferent than on 11.3, to match (perhaps) the Linux kernel behavior of mmap().

An idea that I have is to ask some UNIX guru's whether there exist a autoconf-archive (configure script) test or script to test the behavior of mmap() for portability.

However here I will just report what the issue is : since august 2021 I am building succesfully on Solaris with the following patch


--- opensmalltalk-vm-sun-v5.0.34/platforms/unix/vm/sqUnixSpurMemory.c   Fri Aug 13 19:54:42 2021
+++ p0/opensmalltalk-vm-sun-v5.0.34/platforms/unix/vm/sqUnixSpurMemory.c    Sun Aug 15 14:11:58 2021
@@ -277,16 +277,21 @@
 {
    void *hint = sbrk(0); // a hint of the lowest possible address for mmap
    void *result;
+   char *address;
+   unsigned long alignment;

    pageSize = getpagesize();
    pageMask = ~(pageSize - 1);

+   alignment = max(pageSize,1024*1024);
+   address = (char *)(((usqInt)hint + alignment - 1) & ~(alignment - 1));
+
 #if !defined(MAP_JIT)
 # define MAP_JIT 0
 #endif

    *desiredSize = roundUpToPage(*desiredSize);
-   result =   mmap(hint, *desiredSize,
+   result =   mmap(address, *desiredSize,
 #if DUAL_MAPPED_CODE_ZONE
                    PROT_READ | PROT_EXEC,
 #else

I really need Eliot's help on this.

Eliot : what I did was try to understand your code and you seem to make some alignment calculations on the desired memory location that you request through mmap().

I believe that the traditional UNIX mmap() or the Solaris 10 and Solaris 11.3 mmap() is reacting slightly differently to the requested memory "hint".

By some experimentatoin and debugging I found the above patch, which I wrote based on looking at your code and try to mirror the calculations that you are doing, which are not fully symmetric.

By copying some code, I discovered that with the above patch the build for both Solaris 11.3 and Solaris 11.4 are working fine again.

I think this is a general UNIX issue that could be of interest to all UNIX software.

If an autoconfigure autoconf-archive test would exist to test the behavior of mmap() then that could be dealt in all software that uses mmap() with anonymous memory mapping.

As far as I know the history of the mmap() call is that it originated in the world of Sun and Solaris and is since a very long time present in Linux as well, and I think some improvements in Linux where then backported to Solaris 11.4.

That would explain why the current OpenSmalltalk-VM code works fine on Solaris 11.4.

For reasons of portability to all flavors of UNIX and Linux, I think a configure build test for the code could benefit portability of your Cog VM.

Let us know please what you think. I could simply submit my patch as a patch to mainstream upstream OpenSmalltalk but I don't know your opinion about it.

And the way I have been using it as a platform specific patch (that I apply prior to the build on Solaris) is fine for me as well.

Regards, David Stes

eliotmiranda commented 1 year ago

Hi David,

On Aug 11, 2023, at 6:39 AM, David Stes @.***> wrote:

 Hi,

I have sent in the past email to Eliot Miranda about an issue which I discovered in August 2021 (two years ago), and I thought to raise an issue here.

It is a Linux/UNIX patch issue with the behavior of the mmap() system call or library call as used by the OpenSmalltalk COG vm.

When way back in 2021 some changes were made for Linux ARM8 to the file saUnixSpurMemory.c I reported that this was breaking the Solaris 11.3 build. Curiously the Solaris 11.4 build was still working fine.

I think (I don't know) that perhaps the behavior of mmap() on Solaris 11.4 is slightly diferent than on 11.3, to match (perhaps) the Linux kernel behavior of mmap().

An idea that I have is to ask some UNIX guru's whether there exist a autoconf-archive (configure script) test or script to test the behavior of mmap() for portability.

However here I will just report what the issue is : since august 2021 I am building succesfully on Solaris with the following patch

--- opensmalltalk-vm-sun-v5.0.34/platforms/unix/vm/sqUnixSpurMemory.c Fri Aug 13 19:54:42 2021 +++ p0/opensmalltalk-vm-sun-v5.0.34/platforms/unix/vm/sqUnixSpurMemory.c Sun Aug 15 14:11:58 2021 @@ -277,16 +277,21 @@ { void hint = sbrk(0); // a hint of the lowest possible address for mmap void result;

  • char *address;

  • unsigned long alignment;

    pageSize = getpagesize(); pageMask = ~(pageSize - 1);

  • alignment = max(pageSize,1024*1024);

  • address = (char *)(((usqInt)hint + alignment - 1) & ~(alignment - 1));

  • if !defined(MAP_JIT)

    define MAP_JIT 0

    endif

    desiredSize = roundUpToPage(desiredSize);

  • result = mmap(hint, *desiredSize,

  • result = mmap(address, *desiredSize,

    if DUAL_MAPPED_CODE_ZONE

              PROT_READ | PROT_EXEC,

    else

    I really need Eliot's help on this.

Eliot : what I did was try to understand your code and you seem to make some alignment calculations on the desired memory location that you request through mmap().

I believe that the traditional UNIX mmap() or the Solaris 10 and Solaris 11.3 mmap() is reacting slightly differently to the requested memory "hint".

By some experimentatoin and debugging I found the above patch, which I wrote based on looking at your code and try to mirror the calculations that you are doing, which are not fully symmetric.

By copying some code, I discovered that with the above patch the build for both Solaris 11.3 and Solaris 11.4 are working fine again.

I think this is a general UNIX issue that could be of interest to all UNIX software.

If an autoconfigure autoconf-archive test would exist to test the behavior of mmap() then that could be dealt in all software that uses mmap() with anonymous memory mapping.

As far as I know the history of the mmap() call is that it originated in the world of Sun and Solaris and is since a very long time present in Linux as well, and I think some improvements in Linux where then backported to Solaris 11.4.

That would explain why the current OpenSmalltalk-VM code works fine on Solaris 11.4.

For reasons of portability to all flavors of UNIX and Linux, I think a configure build test for the code could benefit portability of your Cog VM.

I wonder whether your version would work on all platforms. Can you email me your version of the file? That’s easier than working with the patch.

Also, can you hum a few bars on what’s different in n the Hinton your version?

Here’s the intent behind the code:

In Spur memory is laid out as follows: As low as possible: JIT code zone (absent in StackInterpreter) Above and potentially contiguous with code zone: new space

Above and potentially contiguous with new space: first old space segment, loaded from image file

All three can be allocated with one or two map calls, one if execute permission can be applied after the fact to the code zone.

Thereafter new old space segments must be allocated above new space. This is because the store check that maintains the remembered set (those old space objects that refer to new space objects) simply compares an oop (an object’s address) with the end of new space/first old space segment.

So at first the hint is used to ask the OS to allocate the memory low and thereafter is used to ask the OS to allocate the memory higher.

Let us know please what you think. I could simply submit my patch as a patch to mainstream upstream OpenSmalltalk but I don't know your opinion about it.

If it works, it works. Looks OK to me but I’d much rather have one version than choosing via autoconf so email me your version, I’ll play with it, and if it works I’ll replace the old version with yours, otherwise I’ll probably add ifdefs to allow one to select your version via a C compiler flag. Autoconf is, IMNERHO, thd spawn of the devil ;-)

And the way I have been using it as a platform specific patch (that I apply prior to the build on Solaris) is fine for me as well.

I admire your forbearance and am grateful for your patience. Sounds like a pain in the butt to me!

Regards, David Stes

Cheers, Eliot ,,,^..^,,, (phone)

cstes commented 1 year ago

The process to automatically apply patch 09-sqUnixSpurMemory.patch is NOT a "pain in the butt" because it has been used for 2 years automatically : apply the patch prior to build the software on Solaris. If you don't change anything, I can perfectly continue to live with automatically applying my patch. However I'm bringing it to your attention (and I've sent email to you about it, so you should have received the code) because it is nicer if I can bring it to the attention of the author of the VM "upstream" instead of applying a platform specific patch. As expalined, I don't fully understand the alignment calculations but it seems some implementation details of mmap() are different on different UNIX kernels and based on reading some of the alignment calculations, I copied some of the code, and came up with this solution that has been working for me. By the way, by working I mean that on Solaris the SUnit testsuite for Squeak 6.0 seem to pass 1 Great ... The VM is very usable on Solaris and the test suite (the benchmarks running all tests work decently). The package is at version : runtime/smalltalk/cog-spur@5.0.3328,5.11-2023

where the number 5.0.3328 s a triplet that I made up myself to indicate that the VM is based on the VMMaker.oscog-eem.3328 where the latest is [VMMaker.oscog-eem.3332 so 3328 is very close to the latest and it works fine on Solaris.

cstes commented 1 year ago

By the way, I started to see the error sqAllocateMemory: initial alloc failed!

and


sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto mmap: Not enough space
sqAllocateMemory: initial alloc failed!

around the time in 2021 of "oscog-eem.2985" Cog VM and I only observed those errors in the COG VM not in the Stack VM .

As explained since that time I use (succesfully) a patch for the Spur memory management with the Solaris mmap() in the traditional Solaris kernel. I have seen that the patch is not required on Solaris 11.4 which I suspect is due to the fact of backporting changes of the Linux mmap() to UNIX Solaris 11.4.

cstes commented 11 months ago

In private email , I have received confirmation from Eliot Miranda (at least by email) that this patch would be added "this week", but that was already many weeks or months ago. The issue is not urgent because obviously I can simply apply the patch on Solaris and Illumos, but as Eliot indicated it could be interesting to use one version of the code for all platforms, meaning to change the Spur memory management on Linux and other platforms .... I suspect that the patch will not break the Linux memory management but I am not submitting a pull reauest by github because I certainly don't understand the ins and outs of Spur memory mangement so this is a patch to a critical piece of OpenSmalltalk code ....

dtlewis290 commented 11 months ago

Hi David, I tried your patch on Linux 64x64 and it causes no problems. The net effect is that the "hint" for the mmap() call is an address about 150k higher, which should be harmless on any platform and obviously is helpful for Solaris. My only suggestion would be to add a comment in the code to explain that this offset adjustment is required for Solaris and harmless otherwise.

I would suggest that you just go ahead and open a pull request. That will make it easier for one or more people to try this out, and the pull request will cause no problems because it can just be closed if the patch is not accepted.

dtlewis290 commented 11 months ago

David, I hope you (and Eliot) do not mind me getting in the middle of the discussion, but your patch is interesting and maybe I can help with the review.

I think that the patch is safe to include as it is, but it would be good to understand why it is required for Solaris. After reading the man pages at https://illumos.org/man/2/sbrk and https://illumos.org/man/2/mmap I would guess that the difference might related to either: 1) sbrk(0) answers a page-aligned address Linux but not on Solaris, or 2) mmap() has some different requirements for the supplied address (as explained in your notes above).

Your patch finds a different (higher) address for the hint, and also page-aligns it. I am wondering if the page alignment is the main issue, in which case it may be possible to address it by just adding this one line:

   hint = roundUpToPage((usqInt)hint);

This has no net effect on the Linux calculation because the hint address from sbrk() is already page-aligned. And Solaris 11.4 already works, so If if it fixes the problem for Solaris 11.3 then it might be the simplest solution, and no #ifdef needed.

cstes commented 11 months ago

Thanks for the feedback. I am willing to submit a pull request to the opensmalltalk-vm sources but I think raising an issue is more appropriate because I don't fully understand the issue yet, despite for the fact that the patch worked for the last 2 years. In fact I may likely never understand the details of the Spur memory management, and I hesitate to submit a pull request to a complex piece of code that I don't (and never will) fully understand.

The current COG OpenSmalltalk vm in OpenIndiana is now at VMmaker.oscog-eem.3339 and this corresponds with the version 5.0.3339 in OpenIndiana :

The current cog-spur vm is biult with GCC 13 on OpenIndiana and updated versions of Cairo, Pango, Freetype-2 and other dependencies.

I can try next year with new builds the patch hint = roundUpToPage((usqInt)hint);

and see whether that also has the same effec of fixing the sqAllocateMemory: initial alloc failed problem.

So after all it is perhaps a good solution to keep this issue open for a while, and I'll do some new test during the next year and report back on the results.

Regards, David Stes

dcstes commented 10 months ago

I have tested the fix to align the return value of sbrk(). The sinle line fix is sufficient to fix the problem. So this indicates that the return value of sbrk() is related to the problem.

The Single UNIX Specification (SUS) and POSIX standards seem unclear on these subtle properties of the return value.

The SUS standard however seems to be aware of the problem because it states "It is unspecified whether the pointer returned by sbrk() is aligned suitably for any purpose."

See https://pubs.opengroup.org/onlinepubs/7908799/xsh/brk.html for the specification on sbrk()

The Linux manpage is not saying anything on the issue (on a Linux system that I've checked) but the Linux manpage says "STANDARDS 4.3BSD; SUSv1, marked LEGACY in SUSv2, removed in POSIX.1-2001.

NOTES Avoid using brk() and sbrk(): the malloc(3) memory allocation package is the portable and comfortable way of allocating memory."

The Standard SUSv1 is probably the one from the Open Group (Single UNIX Specification).

I have submitted a pull request for my proposal for patch or fix.

It is now more clear to me that the return value of sbrk() has something to do with it, but the mmap() call itself is not failing.

As I wrote before my fix comes from copying Eliot's code and by using the example from sqAllocateMemory() which is also using sbrk() but it is doing an alignment calculation on the return value of sbrk() and from sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto() which has a loop doing mmap() and doing alignment calculations.

So my reasoning for the patch was that because of "consistency" or "symmetry" the same alignment calculation (which I copied from those functions) need to be applied in allocateJITMemory().

The return value of sbrk() is not wrong or right, but for the purpose of the loop in sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto() it must be aligned for that particular purpose of that loop.

Regards, David Stes

cstes commented 10 months ago

Hi, I've changed the title of the issue to "sbrk() saUnixSpurMemory patch" as the error that is raised : mmap() initialAlloc failed made me think at first that this is an mmap§) issue but as explained before it is rather an interaction between sbrk() and the alignment of the return value of that system call (or library function in Linux) and the loop sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto§) which is a specific purpose where it is not certain that the return value of sbrk() is correctly aligned for that purpose.

Also by the way I submitted a PR (pull request) as user "dcstes" (dcstes is a another github account of user "cstes") which hopefully can be considered for fixing this on all UNIX and Linux platforms, which may be helpful for example for BSD variants of UNIX.

The issue has become clearer to me and probably this change or PR will not break anything and only increase portability of the Cog VM.

Regards David Stes

cstes commented 10 months ago

I hope the code to align the return value of sbrk() can be added, as requested ... This will fix the Cog VM on Solaris so that for example OpenSmalltalk will work on Solaris servers such as the Oracle X8-8 and X9-2 or X9-2L which are Intel Xeon based Solaris servers. Of course this fix can also be added as a platform specific patch (and that has alawyas worked over the last 3 years). But because the Single UNIX Specification (SUS) is not just Oracle specific, but also for the "Open Group" where many technology companies such as IBM, Microsoft and others are members, it makes sense to use the sbrk() function as documented in the SUS (Single UNIX Specification) as an "open standard". Thank you ...

cstes commented 10 months ago

Note that this patch for aligning the return value of sbrk() is for Solaris for Intel (https://www.oracle.com/servers/x86/). However the patch is not very machine specific because this is based on the SUS (Single UNIX Specification) so the patch may equally apply to a whole range of different servers and UNIX workstations from the many members of the OpenGroup (https://[www.opengroup.org](https://www.opengroup.org/)/)

eliotmiranda commented 10 months ago

Hi David, as far as I can see the code has already been patched. This is from the tip version of platforms/unix/vm/sqUnixSpurMemory.c,

commit b0558b43c6598198f870834f00def6b85f8ab5d6
Author: Eliot Miranda <eliot.miranda@gmail.com>
Date:   Tue Dec 13 17:04:50 2022 -0800

    Change the type of sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto to get
    sqMacV2Memory.c to compile on 32-bit MacOS. It should have taken usqInts in the
    first place. Implement allocateJITMemory for sqMacV2Memory.c/V3 macOS.
usqInt
sqAllocateMemory(usqInt minHeapSize, usqInt desiredHeapSize)
{
    char *hint, *address, *alloc;
    unsigned long alignment;
    sqInt allocBytes;

#if !COGVM
    if (pageSize) {
        fprintf(stderr, "sqAllocateMemory: already called\n");
        exit(1);
    }
    pageSize = getpagesize();
    pageMask = ~(pageSize - 1);

    hint = sbrk(0);
#else
    assert(pageSize != 0 && pageMask != 0);
    hint = endOfJITZone;
#endif

    alignment = max(pageSize,1024*1024);
    address = (char *)(((usqInt)hint + alignment - 1) & ~(alignment - 1));

    alloc = sqAllocateMemorySegmentOfSizeAboveAllocatedSizeInto
                (roundUpToPage(desiredHeapSize), address, &allocBytes);
    if (!alloc) {
        fprintf(stderr, "sqAllocateMemory: initial alloc failed!\n");
        exit(errno);
    }
    return (usqInt)alloc;
}
cstes commented 10 months ago

There is a call to sbrk() in sqAllocateMemory() but there is another call to sbrk() in allocateJITMemory(). So there are multiple calls to sbrk() in platforms/unix/vm/sqUnixSpurMemory.c. The pull request #672 adds an alignment calculation to the return value of sbrk() in allocateJITMemory() so to mirror the calculation in sqAllocateMemory(). The OpenGroup manpage on sbrk() says that it is unspecified whether the pointer returned by sbrk() is aligned suitably for any purpose (see https://pubs.opengroup.org/onlinepubs/7908799/xsh/brk.html).

cstes commented 9 months ago

Hello, perhaps another possible improvement to the use of sbrk() is that the return value could be inspected for -1 (errno). Like in

hint = sbrk(0à; if (hint == -1) quitvm(errno);

this could theoretically happen in a few cases that the return value of sbrk() is not a pointer or address at all.

However what I observe on Solaris is that sbrk() returns an address but it is not aligned as expected in the loop in OpenSmalltalk that uses that address.

So my pull request is just as simple modification that addresses that issue, not the general issue of whether sbrk() should be used at all or if it can be assumed to return -1 as an error.

The code need not be perfect anyway because the situations where sbrk() will return -1 are already obscure , related to for example maximum sizes imposed to the process.

The immediate motivation in my case is just to have a working VM on the Solaris operating system because obviously if the initiial allocation failed due to address not correctly aligned, then the VM does not start at all ...

cstes commented 9 months ago

There seems to be a new macro lowestPageAligendAddressForMMap() which is defined to call the sbrk(0) system call or library function. This new code using the new macro works on Solaris. I didn't run any SUnitTests yet , I'll try to do that later but for now I'd like to report that the macro which aligns the return value of sbrk(0) seems to work.

cstes commented 8 months ago

it seems the Cog vm with the new macro lowestPageAlignedAddressForMMap() passes the SUnitTests with the same failures/same successes as before so I don't immediately see a difference. This is good news as I think the new macro solves the issue. I can close my own pull request as it is no longer needed. Thanks for looking into this issue.

cstes commented 8 months ago

OpenIndiana now has OpenSmalltalk Cog vm 5.0.3351 in their repository. More info at https://docs.openindiana.org/handbook/community/squeak/ Also on Solaris Cog vm now works without the sbrk() patch. So basicaly the situation is like it was during the last years, but the Cog vm now builds without the applicatoin of the platform specific patch. It is not hard or difficult to apply platform specific patches but whenever it makes sense, it's preferable to discuss the platform specific patches with the authors of the software "upstream". Regards

eliotmiranda commented 8 months ago

You’re most welcome. Thanks for your patience, and for getting things working cleanly on OpenIndianna.