Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

Perl 5.32 segfaults on Linux/m68k (regression) #17871

Closed glaubitz closed 4 years ago

glaubitz commented 4 years ago

Hi!

Perl 5.32 has introduced a regression which results in miniperl segfaulting during build.

The backtrace with gdb looks like this:

root@pacman:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args ./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make mint/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make mininiperl.  Please run make minitminiperl.  Please run make minitest; exit 1'

GNU gdb (Debian 9.2-1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "m68k-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./miniperl...
(gdb) r
Starting program: /srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e \<\?\>
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, 
    my_perl=0x8030a190) at op.c:269
269         if (!slab->opslab_freed) {
(gdb) bt
#0  S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, 
    my_perl=0x8030a190) at op.c:269
#1  0x8000901e in Perl_Slab_Free (op=0x80312136, my_perl=0x8030a190)
    at op.c:505
#2  Perl_Slab_Free (my_perl=0x8030a190, op=0x803120aa) at op.c:484
#3  0x8000fae6 in S_op_destroy (o=<optimized out>, my_perl=<optimized out>)
    at op.c:6416
#4  Perl_op_append_list (type=<optimized out>, last=<optimized out>, 
    first=0x80312136, my_perl=<optimized out>) at op.c:6416
#5  Perl_op_append_list (my_perl=<optimized out>, type=<optimized out>, 
    first=<optimized out>, last=<optimized out>) at op.c:6397
#6  0x800625c8 in Perl_yyparse (my_perl=0x8030a190, gramtype=258)
    at perly.y:239
#7  0x800321fe in S_parse_body (xsinit=0x801885f8 <xs_init>, env=0x0, 
    my_perl=0x8030a190) at perl.c:2576
#8  perl_parse (my_perl=0x8030a190, xsinit=0x801885f8 <xs_init>, argc=7, 
    argv=0xeffffc14, env=0x0) at perl.c:1871
#9  0x800050ae in main (argc=<optimized out>, argv=<optimized out>, 
    env=<optimized out>) at miniperlmain.c:132
(gdb)

A wild guess is that might be an alignment issue as the natural alignment for m68k is 16 bits, not 32 bits.

CC @AndreasSchwab @karcherm

jkeenan commented 4 years ago

On 6/18/20 6:10 AM, John Paul Adrian Glaubitz wrote:

Hi!

Perl 5.32 has introduced a regression which results in |miniperl| segfaulting during build.

The backtrace with |gdb| looks like this:

|root@pacman:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args ./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mint/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mininiperl. Please run make minitminiperl. Please run make minitest; exit 1' GNU gdb (Debian 9.2-1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "m68k-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./miniperl... (gdb) r Starting program: /srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e \<\?> [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. S_link_freed_op (o=0x803120aa, slab=, slab=, my_perl=0x8030a190) at op.c:269 269 if (!slab->opslab_freed) { (gdb) bt #0 S_link_freed_op (o=0x803120aa, slab=, slab=, my_perl=0x8030a190) at op.c:269 #1 0x8000901e in Perl_Slab_Free (op=0x80312136, my_perl=0x8030a190) at op.c:505 #2 Perl_Slab_Free (my_perl=0x8030a190, op=0x803120aa) at op.c:484 #3 0x8000fae6 in S_op_destroy (o=, my_perl=) at op.c:6416

4 Perl_op_append_list (type=, last=,

first=0x80312136, my_perl=) at op.c:6416 #5 Perl_op_append_list (my_perl=, type=, first=, last=) at op.c:6397 #6 0x800625c8 in Perl_yyparse (my_perl=0x8030a190, gramtype=258) at perly.y:239 #7 0x800321fe in S_parse_body (xsinit=0x801885f8 , env=0x0, my_perl=0x8030a190) at perl.c:2576 #8 perl_parse (my_perl=0x8030a190, xsinit=0x801885f8 , argc=7, argv=0xeffffc14, env=0x0) at perl.c:1871 #9 0x800050ae in main (argc=, argv=<optimized out>, env=) at miniperlmain.c:132 (gdb) |

A wild guess is that might be an alignment issue as the natural alignment for m68k is 16 bits, not 32 bits.

CC @AndreasSchwab https://github.com/AndreasSchwab @karcherm https://github.com/karcherm

Please supply the way you invoked 'sh ./Configure' to achieve these results. Example:

sh ./Configure -des -Dusedevel -Duseithreads

Thank you very much. Jim Keenan

glaubitz commented 4 years ago

Hi Jim! Thanks for your answer.

The full build log can be found in: https://buildd.debian.org/status/fetch.php?pkg=perl&arch=m68k&ver=5.32.0%7Erc1-1&stamp=1592359772&raw=0

From the Debian process, it looks like the configure script is invoked as:

/bin/sh debian/config.debian --static

I'm currently bisecting the git tree from upstream now to see if I can find the commit in question that broke the build.

jkeenan commented 4 years ago

On 6/18/20 6:10 AM, John Paul Adrian Glaubitz wrote:

Hi!

Perl 5.32 has introduced a regression which results in |miniperl| segfaulting during build.

The backtrace with |gdb| looks like this:

|root@pacman:/srv/perl2/perl-5.32.0~rc1/build-static# gdb --args ./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mint/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make mininiperl. Please run make minitminiperl. Please run make minitest; exit 1' GNU gdb (Debian 9.2-1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "m68k-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./miniperl... (gdb) r Starting program: /srv/perl2/perl-5.32.0~rc1/build-static/miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e \<\?> [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. S_link_freed_op (o=0x803120aa, slab=, slab=, my_perl=0x8030a190) at op.c:269 269 if (!slab->opslab_freed) { (gdb) bt #0 S_link_freed_op (o=0x803120aa, slab=, slab=, my_perl=0x8030a190) at op.c:269 #1 0x8000901e in Perl_Slab_Free (op=0x80312136, my_perl=0x8030a190) at op.c:505 #2 Perl_Slab_Free (my_perl=0x8030a190, op=0x803120aa) at op.c:484 #3 0x8000fae6 in S_op_destroy (o=, my_perl=) at op.c:6416

4 Perl_op_append_list (type=, last=,

first=0x80312136, my_perl=) at op.c:6416 #5 Perl_op_append_list (my_perl=, type=, first=, last=) at op.c:6397 #6 0x800625c8 in Perl_yyparse (my_perl=0x8030a190, gramtype=258) at perly.y:239 #7 0x800321fe in S_parse_body (xsinit=0x801885f8 , env=0x0, my_perl=0x8030a190) at perl.c:2576 #8 perl_parse (my_perl=0x8030a190, xsinit=0x801885f8 , argc=7, argv=0xeffffc14, env=0x0) at perl.c:1871 #9 0x800050ae in main (argc=, argv=<optimized out>, env=) at miniperlmain.c:132 (gdb) |

A wild guess is that might be an alignment issue as the natural alignment for m68k is 16 bits, not 32 bits.

CC @AndreasSchwab https://github.com/AndreasSchwab @karcherm https://github.com/karcherm

Thank you for this report and especially for the backtrace.

When I first read this ticket, I had no idea what "Linux/m68k" was. Eventually I found this page, http://www.linux-m68k.org/index.html, which informed me that this was a project to port Linux to the Motorola 68000 series of microprocessors. That page appears to have been last updated around 2002. Searching our issue queue for https://github.com/Perl/perl5/issues?q=m68k+in%3Atitle, it appears this is the first bug report we've had for this platform since 2003. And, AFAICT, we have never received a smoke-test report for m68k at http://perl5.test-smoke.org/.

Questions:

Thank you very much. Jim Keenan

glaubitz commented 4 years ago

When I first read this ticket, I had no idea what "Linux/m68k" was. Eventually I found this page, http://www.linux-m68k.org/index.html, which informed me that this was a project to port Linux to the Motorola 68000 series of microprocessors. That page appears to have been last updated around 2002.

We are maintaining it in Debian, see: https://buildd.debian.org/status/architecture.php?a=m68k&suite=sid

Searching our issue queue for https://github.com/Perl/perl5/issues?q=m68k+in%3Atitle, it appears this is the first bug report we've had for this platform since 2003.

It might just have worked well all the time :).

Probably the only way we could identify the commit which broke this build would be to run Porting/bisect.pl on the machine where you observed this failure. Is that at all feasible?

Sure. It's also reproducible in QEMU. You can easily install Debian/m68k on QEMU, see: https://wiki.debian.org/M68k/QemuSystemM68k

To continue to support perl on this platform going forward, we would need to have regular smoke-test reports? Would that be feasible?

Depends on what you mean with regular. People are maintaining Linux (and NetBSD) on classic computers in their free time. So if you are asking for a weekly CI run, it might be a bit too much.

Does this architecture have sufficient use in production to warrant an attempt to maintain perl on it?

It's used by the community who are maintaining Linux and NetBSD for older architectures.

In any case, Perl doesn't have any architecture-specific bits, does it? And it just seems there is a minor regression to me, so I don't think it would take much effort to fix this problem.

Leont commented 4 years ago

From the Debian process, it looks like the configure script is invoked as:

./perl -Ilib -V:config_args should also give the required information if you can get that far. Otherwise you could peek into lib/Config_heavy.pl and look for "config_args".

I'm currently bisecting the git tree from upstream now to see if I can find the commit in question that broke the build.

I would guess it to be 7b85c12a47eeaeb8aaaa0c95fdbdd48ecd5f929d (only change in that area at first glance), @iabyn do you have a clue?

glaubitz commented 4 years ago

I have bisected the problematic commit to be 0bd6eef439e2cdb97b9b64430c17d4f523f75914.

Reverting it, fixes Perl for me on m68k.

glaubitz commented 4 years ago

It's the alignment of opslot that causes the segmentation fault.

This fixes the problem for me:

diff --git a/op.h b/op.h
index fc21f03cda..480c95245b 100644
--- a/op.h
+++ b/op.h
@@ -698,7 +698,7 @@ struct opslot {
     U16         opslot_size;        /* size of this slot (in pointers) */
     U16         opslot_offset;      /* offset from start of slab (in ptr units) */
     OP         opslot_op;              /* the op itself */
-};
+} __attribute__ ((aligned (4)));

 struct opslab {
     OPSLAB *   opslab_next;            /* next slab */

The native alignment on m68k is 16 bits.

glaubitz commented 4 years ago

Alternatively, one should be able to add a padding variable of type U16 in the struct.

iabyn commented 4 years ago

On Thu, Jun 18, 2020 at 12:37:29PM -0700, John Paul Adrian Glaubitz wrote:

It's the alignment of opslot that causes the segmentation fault.

In what way exactly is the compiler getting the alignment wrong?

-- Fire extinguisher (n) a device for holding open fire doors.

jkeenan commented 4 years ago

On 6/18/20 3:18 PM, John Paul Adrian Glaubitz wrote:

I have bisected the problematic commit to be 0bd6eef https://github.com/Perl/perl5/commit/0bd6eef439e2cdb97b9b64430c17d4f523f75914.

Reverting it, fixes Perl for me on m68k.

In the core distribution I have created branch 'smoke-me/jkeenan/gh-17871-m68k' to make this patch available for smoke-testing.

Glaubnitz, if you could provide us (off-list) with an appropriate email-address, that would facilitate our listing you properly in AUTHORS.

Thank you very much. Jim Keenan

glaubitz commented 4 years ago

The native alignment on m68k is 16 bits, not 32 bits, i.e. what you would expect from a 32-bits architecture.

jkeenan commented 4 years ago

@tonycoz Can you take a look? Thanks.

glaubitz commented 4 years ago

One of the structs used will align to 16 bits on m68k, while it will align to 32 bits on any other 32-bit architecture and to 64 bits on 64-bit targets.

I tried to use padding bytes, but so far without success. Normally, padding bytes would have the same effect as using the __attribute((aligned(4))) built-in, but for that one needs to calculate the proper padding so that the resulting struct has the correct size.

I will continue to figure that out tomorrow, it's getting late here. But if you have found a better solution in the mean time, let me know.

So far, many thanks for the friendly discussion.

jkeenan commented 4 years ago

Early smoke-test results (e.g., http://perl5.test-smoke.org/report/114741) suggest that this patch will cause perl not to build on Windows.

glaubitz commented 4 years ago

@jkeenan Yes, that's not surprising as Microsoft's Visual C/C++ compiler does not understand the directive __attribute__((aligned(X))). That's why I prefer using padding bytes which work on any platform and compiler. I just haven't found the correct padding yet.

jkeenan commented 4 years ago

On 6/18/20 5:36 PM, James E Keenan wrote:

Early smoke-test results (e.g., http://perl5.test-smoke.org/report/114741) suggest that this patch will cause perl not to build on Windows.

Also: https://github.com/Perl/perl5/runs/786025468

glaubitz commented 4 years ago

Good morning! After some debugging, I figured out where the padding is needed. Updated patch below which fixes the build for me.

From 3585dd865baaa3c631964fbba758491b26ef954a Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 07:42:08 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

On architectures where the native alignment is less than 4 bytes such as
Motorola 68000 (m68k), we need an additional padding of 2 bytes such that
the offset of the slots inside the slabs is correct.
---
 op.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/op.h b/op.h
index fc21f03cda..fb9f538e23 100644
--- a/op.h
+++ b/op.h
@@ -714,6 +714,7 @@ struct opslab {
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
 # endif
+    U16         opslab_padding;                /* padding to ensure proper alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };

-- 
2.27.0
glaubitz commented 4 years ago

An alternative description for the commit would be:

Perl crashes with a segmentation fault on Linux/m68k due to an alignment
issue. On m68k, the natural alignment is 16 bits which causes the opslot
member of "struct opslab" to be aligned at a 16-bit offset.

On other 32-bit and 64-bit architectures, the alignment is at least
32 bits, so the offset is always guaranteed to be 32-bit aligned. Fix
this by adding an additional 16 bits padding before the opslot member
which causes the offset of oplab_slots to be 32-bit aligned.

On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment or struct size.
Tux commented 4 years ago

I think it would be wise to also mention that m68k is big-endian

glaubitz commented 4 years ago

Hmm, is that actually relevant?

Here's an updated patch with the improvement commit message:

From 8cd015a5cc230105bdcb8c33db80ef9f06b1c670 Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 08:33:33 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

Perl crashes with a segmentation fault on m68k due to an alignment issue.
On m68k, the natural alignment is 16 bits which causes the opslot
member of "struct opslab" to be aligned at a 16-bit offset.

On other 32-bit and 64-bit architectures, the natural alignment is at
least 32 bits, so the offset is always guaranteed to be 32-bit aligned.
Fix this by adding an additional 16 bits padding before the opslot member
which causes the offset of oplab_slots to be 32-bit aligned.

On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
 op.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/op.h b/op.h
index fc21f03cda..fb9f538e23 100644
--- a/op.h
+++ b/op.h
@@ -714,6 +714,7 @@ struct opslab {
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
 # endif
+    U16         opslab_padding;                /* padding to ensure proper alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };

-- 
2.27.0
glaubitz commented 4 years ago

Btw, don't mind the formatting issues, the diff output just interprets the tabs stops differently.

@jkeenan Could you smoke-test the last patch? Thanks.

glaubitz commented 4 years ago

Please hold off with the patch, we're not quite happy yet because it's not clear what happens when PERL_DEBUG_READONLY_OPS is set.

glaubitz commented 4 years ago

Okay, this should be correct now. Improved the description again:

From 65bed710c6b990285d7feae63e8e9b8c6f8ab8ec Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 12:30:57 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

On m68k, the natural alignment is 16 bits which causes the opslab_opslot
member of "struct opslab" to be aligned at a 16-bit offset. Other 32-bit
and 64-bit architectures have a natural alignment of at least 32 bits, so
the offset is always guaranteed to be at least 32-bit-aligned.

Fix this by adding additional padding bytes before the opslab_opslot
member, both for cases when PERL_DEBUG_READONLY_OPS defined and not
defined to ensure the offset of oplab_slots is always 32-bit-aligned.
On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
 op.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/op.h b/op.h
index fc21f03cda..b620983d38 100644
--- a/op.h
+++ b/op.h
@@ -713,7 +713,9 @@ struct opslab {
                                            units) */
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
+    U8          opslab_padding1[3];    /* padding to ensure that opslab_slots is always */
 # endif
+    U16         opslab_padding2;       /* located at an offset with 32-bit alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };

-- 
2.27.0

CC @geertu @vivier

jkeenan commented 4 years ago

Okay, this should be correct now. Improved the description again:

From 65bed710c6b990285d7feae63e8e9b8c6f8ab8ec Mon Sep 17 00:00:00 2001
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 19 Jun 2020 12:30:57 +0200
Subject: [PATCH] op.h: Add additional padding to struct opslab to ensure
 proper alignment

On m68k, the natural alignment is 16 bits which causes the opslab_opslot
member of "struct opslab" to be aligned at a 16-bit offset. Other 32-bit
and 64-bit architectures have a natural alignment of at least 32 bits, so
the offset is always guaranteed to be at least 32-bit-aligned.

Fix this by adding additional padding bytes before the opslab_opslot
member, both for cases when PERL_DEBUG_READONLY_OPS defined and not
defined to ensure the offset of oplab_slots is always 32-bit-aligned.
On architectures which have a natural alignment of at least 32 bits,
the padding does not affect the alignment, offsets or struct size.
---
 op.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/op.h b/op.h
index fc21f03cda..b620983d38 100644
--- a/op.h
+++ b/op.h
@@ -713,7 +713,9 @@ struct opslab {
                                            units) */
 # ifdef PERL_DEBUG_READONLY_OPS
     bool       opslab_readonly;
+    U8          opslab_padding1[3];    /* padding to ensure that opslab_slots is always */
 # endif
+    U16         opslab_padding2;       /* located at an offset with 32-bit alignment */
     OPSLOT     opslab_slots;           /* slots begin here */
 };

-- 
2.27.0

CC @geertu @vivier

I'm having considerable problems applying this patch.

$ git checkout blead
Switched to branch 'blead'
Your branch is up to date with 'origin/blead'.
[perl] 565 $ git am < ~/learn/perl/p5p/gh-2nd-17871.diff
Applying: op.h: Add additional padding to struct opslab to ensure proper alignment
error: patch failed: op.h:713
error: op.h: patch does not apply
Patch failed at 0001 op.h: Add additional padding to struct opslab to ensure proper alignment
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Can I ask that you:

Alternatively, create a pull request.

Thank you very much. Jim Keenan

glaubitz commented 4 years ago

@jkeenan I might need to rebase the patch, I'll do that. In the meantime, we have come up with an even better variant on the Debian/m68k mailing list ;).

I'll open a PR later today. Thanks so much for your patience ;).

glaubitz commented 4 years ago

PR opened as #17875.

tonycoz commented 4 years ago

The error is at:

#0  S_link_freed_op (o=0x803120aa, slab=<optimized out>, slab=<optimized out>, 
    my_perl=0x8030a190) at op.c:269

Line 269 is:

    if (!slab->opslab_freed) {

but opslab_freed is part of the base structure, so I'm confused that your suggested patch can make any difference.

Also, OPSLOT is:

struct opslot {
    U16         opslot_size;        /* size of this slot (in pointers) */
    U16         opslot_offset;      /* offset from start of slab (in ptr units) */
    OP      opslot_op;      /* the op itself */
};

and OP is (expanding BASEOP):

struct op {
    OP*     op_next;        \
    OP*     op_sibparent;       \
    OP*     (*op_ppaddr)(pTHX); \
    PADOFFSET   op_targ;        \
    PERL_BITFIELD16 op_type:9;      \
    PERL_BITFIELD16 op_opt:1;       \
    PERL_BITFIELD16 op_slabbed:1;   \
    PERL_BITFIELD16 op_savefree:1;  \
    PERL_BITFIELD16 op_static:1;    \
    PERL_BITFIELD16 op_folded:1;    \
    PERL_BITFIELD16 op_moresib:1;       \
    PERL_BITFIELD16 op_spare:1;     \
    U8      op_flags;       \
    U8      op_private;

};

which contains pointers, which your compiler should be aligning to 32-bits if pointers require alignment, and hence the OP/OPSLAB should be aligned to 32-bits inside of OPSLAB and have a size that's a a multiple of 32-bits (so arrays work).

I don't see how you're getting mis-aligned access, unless calloc is returning a mis-aligned pointer, or there's some compiler option that's disabling required padding.

glaubitz commented 4 years ago

@tonycoz The natural alignment on m68k is 16 bits, not 32 bits.

See: https://lwn.net/Articles/790735/ and http://www.catb.org/esr/structure-packing/#_alignment_requirements

xenu commented 4 years ago

You keep repeating that but I don't think it explains anything. The compiler will always align all structures properly (that is, in a correct but not necessarily the most optimal way). In the standard C there's no such thing as a misaligned struct.

The only thing that can be misaligned is a pointer (memory address). So if that issue really was caused by a misaligned read, my question is: where exactly did that misaligned address come from? Usually that happens because of a cast to an incompatible type, often paired with pointer arithmetic.

glaubitz commented 4 years ago

GCC aligns on 16-bit boundaries on m68k. For example, sizeof(struct opslab) returns 46 on m68k, not 48 as on i386. If you want, I can run a small C test program you send me to demonstrate that.

@xenu At some point, the Perl code casts the pointer to (INT32 *) if I remember correctly, let me check.

@GeertU Maybe you can explain it better as you have been doing m68k development much longer than I have.

glaubitz commented 4 years ago

Here is one such case of casts:

/* the first (head) opslab of the chain in which this op is allocated */
# define OpSLAB(o) \
    (((OPSLAB*)( (I32**)OpSLOT(o) - OpSLOT(o)->opslot_offset))->opslab_head)

And the C specification here (http://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf) states:

All declarations that refer to the same object or function shall have compatible type;otherwise, the behavior is undefined.

The compiler cannot guarantee alignment if you are casting pointer types. It's also a common problem on SPARC where this causes unaligned access which results in crashes with Bus error.

xenu commented 4 years ago

Yes, indeed that code seems to assume that sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void*). Now I understand the problem.

xsawyerx commented 4 years ago

PR was merged. Thank you.

tonycoz commented 4 years ago

Yes, indeed that code seems to assume that sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void*). Now I understand the problem.

Both sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void *) (which is 4 bytes, like any other pointer on 68k), since both structures contain pointers.

If they didn't the odd elements in an array of such structures would be mis-aligned.

xenu commented 4 years ago

There is no such rule. On the platforms with 16-bit alignment, sizeof of the following struct will be '6' (assuming sizeof(void*) == 4):

struct {
    uint16_t foo;
    void* bar;
}
tonycoz commented 4 years ago

That would make a pointer to bar misaligned, even without an array being involved.

xenu commented 4 years ago

While it's usually the case, there's no rule that says that the platform's native alignment must be the same as the pointer size. On 68k reads of all sizes from addresses divisible by 2 are always legal (i.e. properly aligned). On x86 that number is '4', while on amd64 it's '8' (let's ignore SIMD instructions).

BTW, if you want to play with a 68k compiler, I've just found an instance of Compiler Explorer with 68k gcc installed: http://brownbot.mooo.com/. As you can see in the assembly output here, the compiler behaves exactly as I said.

xenu commented 4 years ago

Another example of a platform where the native alignment is not the same as the pointer size is X32 Linux ABI. Despite the fact that sizeof(void*) == 4, the native alignment is 8 bytes: https://gcc.godbolt.org/z/ZaiVRo

geertu commented 4 years ago

Both sizeof(struct opslab) and sizeof(struct opslot) will be a multiple of sizeof(void *) (which is 4 bytes, like any other pointer on 68k), since both structures contain pointers.

This is a common misconception: sizeof(struct) is not a multiple of max(sizeof(struct.member)), but rather max(alignof(struct.member)). alignof(void *), alignof(int), and alignof(long) are 2 bytes on m68k (and cris, a.o.).

tonycoz commented 4 years ago

Yeah, I think I misunderstood the problem. I thought we were seeing alignment errors (given the patch description) on access, but the actual problem isn't alignment on access, but that the assumption that offsetof(struct opslab, opslab_slots) % sizeof(I32 *) == 0 is incorrect.

This makes the OpSLAB() macro return an incorrect pointer to the slab and everything breaks from there.

I'll work on a less fragile fix for this that doesn't require manually managing alignment.