OpenSmalltalk / opensmalltalk-vm

Cross-platform virtual machine for Squeak, Pharo, Cuis, and Newspeak.
http://opensmalltalk.org/
Other
557 stars 111 forks source link

Armv8 crash in MVC #617

Closed edoneel closed 2 years ago

edoneel commented 2 years ago

Hi,

ArmV8 crashes when you do the following:

  1. Start Squeak.
  2. Open a MVC Project
  3. in the MVC project open a Workspace
  4. Type something, and then enough returns to cause the something to scroll of the screen and the scroll bar to flop out.
  5. Scroll back --> segfault.

The patch that fixes this is attached. The patch is to take the slow path in fastPathBottomToTop in platforms/Cross/plugins/BitBltPlugin/BitBltGeneric.c

This only shows up when you use the optimized flag.

Crashes on:

gcc version 10.2.1 20210110 (Debian 10.2.1-6) gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)

cheers

bruce PatchBitBltGeneric.patch.txt

eliotmiranda commented 2 years ago

Can we try and fix this properly? I'm always loathe to discard fast features just because. I'd rather understand the crash better. Tim?

marceltaeumel commented 2 years ago

Also, why are these fast code paths not even compiled for x86 anymore? It's not new code...

edoneel commented 2 years ago

Hi,

In response to Marcel - it does not seem that these fast code paths are designed to work on x86.

This is selected in ./opensmalltalk-vm/platforms/unix/config/configure around line 18555 or so. I have pasted it below.

The BitBltGeneric.c file that fails is only but for Arm32 and Arm64.

In the x86-64 one only uses BitBltPlugin.c.

I've not spent a lot of time looking but it looks like the contents of platforms/Cross/plugins/BitBltPlugin is just the source code that the Pi Foundation paid to write.

So now it is clearer why it never failed for anything else other than the ArmV8 systems.

Maybe the other problem is that accidentally this is thinking it is still on a 32 bit systems rather than a 64 bit systems since it does seem to be used by the 32 bit systems as well.

When I set a breakpoint at fastPathBottomToTop on a Arm32 system, and then do the steps above that causes the crash on Arm64 then it hits the breakpoint.

cheers

bruce

Arm32 example:

$ lscpu Architecture: armv7l Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Vendor ID: ARM Model: 3 Model name: Cortex-A72 Stepping: r0p3 CPU max MHz: 1800.0000 CPU min MHz: 600.0000 BogoMIPS: 108.00 Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32

GNU gdb (Raspbian 8.2.1-2) 8.2.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "arm-linux-gnueabihf". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.

For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./squeak...done. (gdb) break fastPathBottomToTop Breakpoint 1 at 0xc1b0c: file. XXXopensmalltalk-vm/platforms/Cross/plugins/BitBltPlugin/BitBltGeneric.c, line 519. (gdb) run ~/archive/StcStw/sim/StcStw-32 Starting program: XXX/local/squeak/squeak ~/archive/StcStw/sim/StcStw-32 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0xb5eaa400 (LWP 22687)] [New Thread 0xb56a9400 (LWP 22688)] [New Thread 0xb4da8400 (LWP 22689)] pthread_setschedparam failed: Operation not permitted This VM uses a separate heartbeat thread to update its internal clock and handle events. For best operation, this thread should run at a higher priority, however the VM was unable to change the priority. The effect is that heavily loaded systems may experience some latency issues. If this occurs, please create the appropriate configuration file in /etc/security/limits.d/ as shown below:

cat <<END | sudo tee /etc/security/limits.d/squeak.conf

and report to the squeak mailing list whether this improves behaviour.

You will need to log out and log back in for the limits to take effect. For more information please see https://github.com/OpenSmalltalk/opensmalltalk-vm/releases/tag/r3732#linux

Thread 1 "squeak" hit Breakpoint 1, fastPathBottomToTop (op=0xbefcc7a8, flags=541212736)

Snip from configure.

plugin="BitBltPlugin" plibs="" rm -f BitBltPlugin.sub BitBltPlugin.lib bitblt_objs="BitBltPlugin.o" bitblt_flags="" arm_arch=""

case $host_cpu in arm) case $host_cpu in armv6) arm_arch="6" ;; armv7*) arm_arch="7-A" ;; esac

Check whether --enable-fast-bitblt was given.

if test "${enable_fast_bitblt+set}" = set; then : enableval=$enable_fast_bitblt; if test "x$enableval" = "xyes" ; then bitblt_objs="BitBltPlugin.o BitBltArm.o BitBltArmLinux.o BitBltArmSimd.o BitBltDispatch.o BitBltGeneric.o BitBltArmSimdAlphaBlend.o BitBltArmSimdBitLogical.o BitBltArmSimdCompare.o BitBltArmSimdPixPaint.o BitBltArmSimdSourceWord.o" bitblt_flags="-DENABLE_FAST_BLT" fi

fi

;; aarch64)

Check whether --enable-fast-bitblt was given.

if test "${enable_fast_bitblt+set}" = set; then : enableval=$enable_fast_bitblt; if test "x$enableval" = "xyes" ; then bitblt_objs="BitBltPlugin.o BitBltArm64.o BitBltDispatch.o BitBltGeneric.o" bitblt_flags="-DENABLE_FAST_BLT" fi

fi

;;

esac

BITBLT_OBJS=$bitblt_objs

BITBLT_FLAGS=$bitblt_flags

ARM_ARCH=$arm_arch

OpenSmalltalk-Bot commented 2 years ago

On 2022-03-02, at 12:52 PM, edoneel @.***> wrote:

Hi,

In response to Marcel - it does not seem that these fast code paths are designed to work on x86.

I think that Ben intended them to be cpu generic, along with a few other parts that are just C code. The ARM specific parts are those that rely on ARM instructions. We may not have ever tried the ENABLE_FAST_BLT (I tihnk that is the #define?) on any other machines.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim Strange OpCodes: KFP: Kindle Fire in Printer

edoneel commented 2 years ago

Hi While it might be CPU independent the configure script on linux only pays attention to --enable-fast-bitblt in the cases where it is arm*. So no speed up for anyone else.

It does not mean it might not work but not implemented yet.

eliotmiranda commented 2 years ago

So am I correct in thinking that the bug is the configure script and fast bitblt code should be chosen only for 32-bit arm machines? That would imply that another fix would to modify the fast path guard in BitBltGeneric.c so that it only applied to 32-bit arm, right?

OpenSmalltalk-Bot commented 2 years ago

On 2022-03-03, at 7:53 AM, Eliot Miranda @.***> wrote:

So am I correct in thinking that the bug is the configure script and fast bitblt code should be chosen only for 32-bit arm machines? That would imply that another fix is to modify the fast path guard in BitBltGeneric.c so that it only applied to 32-bit arm, right?

No, not at all; Ben spent a lot of time (and RPF's money!) making it work for ARM64 last March/April. The generic code is simply C code that ought to be good for any platform.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim A fool and his money are soon partying

edoneel commented 2 years ago

Hi all,

First, I am only looking at Linux. I don't have an Armv8 Mac so I can't check that.

The code is built, by default, for both Arm32 and Arm64.

The flag enable-fast-bitblt is set default to yes but can be turned off.

This flag only affects Arm32 and Arm64 builds and is ignored on other builds.

On Arm32 it works fully. On Arm64 there seems only to be a bug in fastPathBottomToTop. It also only fails with the screen depth set to 32bits.

This code is not built on x86-64 or other systems. Maybe it could be built, maybe not.

cheers

bruce

nicolas-cellier-aka-nice commented 2 years ago

fastPathBottomToTop does not do much... It just reverse the line order. https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/f5f0e7d98effd7217dded6ed9397b2cac1f787ad/platforms/Cross/plugins/BitBltPlugin/BitBltGeneric.c#L517

I foresee two cases where it could fail:

The later is possible because srcPitch and dstPitch are declared usqInt in operation_t https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/f5f0e7d98effd7217dded6ed9397b2cac1f787ad/platforms/Cross/plugins/BitBltPlugin/BitBltDispatch.h#L98

usqInt is 64bits on ARM64, so far so good, unfortunately those variables are copied to 32 bits local temps in https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/f5f0e7d98effd7217dded6ed9397b2cac1f787ad/platforms/Cross/plugins/BitBltPlugin/BitBltInternal.h#L137

I think that it is the mistake, because pointer arithmetic in 64 bits will be performed with 64 bits size_t, and the uint32_t will just move the pointer a large amount forward (2^32-pitch) instead of backward (-pitch).

So I think that declaring the local srcPitch and dstPitch as usqInt instead of uint32_t might fix the bug.

I have no ARM64 at hand to play with, so up to you to verify the hypothesis.

OpenSmalltalk-Bot commented 2 years ago

Brilliant analysis. How much of our lives is messed up by singed/unisgned/16/32/64 bits and the joy of C decls...

On 2022-03-09, at 1:06 PM, Nicolas Cellier @.***> wrote:

fastPathBottomToTop does not do much... It just reverse the line order. https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/f5f0e7d98effd7217dded6ed9397b2cac1f787ad/platforms/Cross/plugins/BitBltPlugin/BitBltGeneric.c#L517

I foresee two cases where it could fail:

• the fast operation by itself has a BUG (whatether BottomToTop or not) • some integer operations are carried on a larger integer type, transforming negative pitch into a large positive int The later is possible because srcPitch and dstPitch are declared usqInt in operation_t https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/f5f0e7d98effd7217dded6ed9397b2cac1f787ad/platforms/Cross/plugins/BitBltPlugin/BitBltDispatch.h#L98

usqInt is 64bits on ARM64, so far so good, unfortunately those variables are copied to 32 bits local temps in https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/f5f0e7d98effd7217dded6ed9397b2cac1f787ad/platforms/Cross/plugins/BitBltPlugin/BitBltInternal.h#L137

I think that it is the mistake, because pointer arithmetic in 64 bits will be performed with 64 bits size_t, and the uint32_t will just move the pointer a large amount forward (2^32-pitch) instead of backward (-pitch).

So I think that declaring the local srcPitch and dstPitch as usqInt instead of uint32_t might fix the bug.

I have no ARM64 at hand to play with, so up to you to verify the hypothesis.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim Strange OpCodes: DMZ: Divide Memory by Zero

nicolas-cellier-aka-nice commented 2 years ago

How much of our lives is messed up by singed/unisgned/16/32/64 bits and the joy of C decls...

a tiny bit of our life - as if we could use only 7/15/31/63 bits safely in real life.

But a single bit less makes a factor two Tim, so my conclusion is that half of our programmer life is messed up by this insane model!

edoneel commented 2 years ago

Excellent! Very clean fix. Thanks!

It works for me on both Arm64 and Arm32.