egallesio / STklos

STklos Scheme
http://stklos.net
GNU General Public License v2.0
69 stars 17 forks source link

STklos seems to only compile with `USE_COMPUTED_GOTO` #528

Closed jpellegrini closed 1 year ago

jpellegrini commented 1 year ago

Hi @egallesio !

I have tried to build STklos with VM statistics, and it segfaulted.

./configure CC=gcc CFLAGS="-Wall -Wextra -g -O0 -DSTAT_VM" --enable-threads=none  && make  -j9

The result was:

./tmpcomp -o stklos-compile stklos-compile.stk
./tmpcomp -o stklos-genlex stklos-genlex.stk
(cd ../lib/stklos; make  preproc.ostk)
make[2]: Entering directory '/home/jeronimo/pkg/scheme/STklos-github/lib/stklos'
../../utils/tmpcomp -o preproc.ostk preproc.stk
Segmentation fault
Segmentation fault
make[1]: *** [Makefile:575: stklos-genlex] Error 139
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:572: stklos-compile] Error 139
Segmentation fault
make[2]: *** [Makefile:659: preproc.ostk] Error 139
make[2]: Leaving directory '/home/jeronimo/pkg/scheme/STklos-github/lib/stklos'
make[1]: *** [Makefile:581: ../lib/stklos/preproc.ostk] Error 2
make[1]: Leaving directory '/home/jeronimo/pkg/scheme/STklos-github/utils'

Running gdb src/stklos shows this backtrace:

Program received signal SIGSEGV, Segmentation fault.
hash_string (string=0x400000004 <error: Cannot access memory at address 0x400000004>) at hash.c:86
86    for (c = *string; c ; c = *string++) {
(gdb) where
#0  hash_string (string=0x400000004 <error: Cannot access memory at address 0x400000004>) at hash.c:86
#1  0x00005555555828ee in hash_get_variable (h=0x7ffff7f83f58, v=0x7ffff7719960, index=0x7fffffffda6c) at hash.c:354
#2  0x000055555558297e in STk_hash_get_variable (h=0x7ffff7f83f58, v=0x7ffff7719960) at hash.c:365
#3  0x000055555557a4fe in STk_lookup (symbol=0x7ffff7719960, env=0x7ffff7f83f30, ref=0x7fffffffdb70, err_if_unbound=0) at env.c:781
#4  0x00005555555ae877 in run_vm (vm=0x7ffff7f8df00) at vm.c:1095
#5  0x00005555555b3ec9 in STk_boot_from_C () at vm.c:2322
#6  0x00005555555a2849 in main (argc=0, argv=0x7fffffffe020) at stklos.c:276

My plan was to add timings to the statistics. I see you have already written code to compute the frequencies of instructions and pairs of instructions. I wanted to add a clock() call there too, and create a timing table for instructions, but the current code doesn't compile with STAT_VM. And I have reviewed the code, but couldn't find anything wrong in it. (?)

jpellegrini commented 1 year ago

I tried compiling with only DEBUG_VM and it fails. So it was no the STAT_VM code, but the debugging code. I'll keep looking into this.

jpellegrini commented 1 year ago

Can't see what's wrong with the DEBUG_VM code either. Is it OK if I send a PR splitting DEBUG_VM from STAT_VM? That may make the two easier to deal with.

egallesio commented 1 year ago

Hi @jpellegrini , I am not too surprised that code compiled with DEBUG_VM or STAT_VM no longer works. I have not activated these flags since a long time, and things have changed probably a lot since last time they were used. Feel free to split DEBUG_VM from STAT_VM if it makes things easier for you.

BTW, having an idea of the instruction timings would be really awesome.

jpellegrini commented 1 year ago

Thanks @egallesio ! I'll try to get both debug and stats working.

jpellegrini commented 1 year ago

It doesn't seem to be DEBUG_VM either. From what I can see, STklos only compiles properly with USE_COMPUTED_GOTO. But I can't see why.

If you comment out the part in which USE_COMPUTED_GOTO is defined:

//#if defined(__GNUC__) && !defined(DEBUG_VM)
   /* Use computed gotos to have better performance */
//#  define USE_COMPUTED_GOTO
//#  define CASE(x)       lab_##x:
//#  define NEXT          goto *jump_table[fetch_next()]
//#else
   /* Standard C compiler. Use the classic switch statement */
#  define CASE(x)       case x:
#  define NEXT          continue;/* Be sure to not use continue elsewhere */
//#endif

Then STklos fails to compile. When the generated stklos binary is called, it tries to look up a variable, but instead of passing a symbol to STk_lookup, it passes some other object, which does not have the proper SYMBOL_PNAME, and crashes. This is with plain STklos code from git (the only change was that one I showed above).

Anyway, I have made changes that gather instruction count and timings, but they only work with the computed gotos version. I'll make a PR later.

jpellegrini commented 1 year ago

I've made a PR (see #530) -- the stats code works fine, but only for the version that compiles (the one with computed gotos).

I tried compiling plain STklos (as in the previous comment, removing the USE_COMPUTED_GOTO definition) with both GCC and Clang, and both fail exactly at the same place, with the same backtrace...

jpellegrini commented 1 year ago

The first two opcodes are 37 and 86 (PREPARE_CALL AND GREF_INVOKE). Then it crashes. I think it doesn't get a symbol for the procedure name that it's invoking.

jpellegrini commented 1 year ago

The crashing version of the code will start executing these bytes:

37 86 00 00 31

while the good (USE_COMPUTED_GOTO) version executes this:

38 01 47 00 16

So the bytecode that is being executed is different!

The strange part is that boot.c always agrees with the version that crashes!

STk_instr STk_boot_code [] = { 
0x25,
0x56,
0x0,
0x0,
0x1f,
...
jpellegrini commented 1 year ago

So the bytecode that is being executed is different!

The strange part is that boot.c always agrees with the version that crashes!

Which makes me think that the VM is actually not reading, or perhaps not interpreting the bytecode properly...

I don't have much time now, but I'll try to take a closer look into this in the following days.

jpellegrini commented 1 year ago

I have been trying to understand what happens, and this really is a mystery to me. I have read the vm code over and over, and used several debugging ideas, but I can't see why the code executed with USE_COMPUTED_GOTO is not the same executed without it. It must be something related to how instructions are decoded, but looking at the code, it doesn't seem that it can be the case.

Not sure what to try next... Perhaps you can spot the problem more easily, @egallesio ?

jpellegrini commented 1 year ago

Ah. Version 1.70 does not have this problem! I'll spend some time in the next days compiling intermediate git checkouts to see if we can find out what happened.

jpellegrini commented 1 year ago

Aha! It's commit 7fb4042e5104f8bf8a6763f98633cf691348b0d7, Fix multiline macros.

jpellegrini commented 1 year ago

It seems that if we remove the do{...}while(0) guards from:

then it woks. But those guards are there for a reason... And I'm not sure why they're causing this trouble.

jpellegrini commented 1 year ago

Aaaah, yes!

#  define NEXT          continue;

Then we have, in LOCK_AND_RESTART, a NEXT. This should be the continue for the VM loop, but we introduced a do, so now the continue will work for the do statement, not the VM loop... And the semantics of it all has changed!

But I'm not sure what would be the best way to fix this...

jpellegrini commented 1 year ago

@egallesio -- an ugly solution would be to not explicitly use continue, but have a label at the end of the loop, and goto there... Is there a nicer way?

jpellegrini commented 1 year ago

Perhaps using

#define SOME_MACRO  if(1){ \
 ...
 ...
}else{}

would work? (I didn't think too much about it but it seems that it wouldn't cause problems) Slightly uglier than the do{ thing, and not standard practice, but I guess the #define NEXT continue should stay as it is, no?

egallesio commented 1 year ago

Nice investigation Sherlock :smiley:.

I'm not very found of the version using if/else, since it is not "standard". Using a goto at the top of the loop in a VM is not shocking, IMHO (moreover, it is hidden in the NEXT macro).

jpellegrini commented 1 year ago

Ok @egallesio !

I'll try to make a new PR tomorrow then, with the goto! :grin:

jpellegrini commented 1 year ago

since it is not "standard".

I agree, but it seems like a strange standard, since it's fragile, as we ourselves found out... :)

egallesio commented 1 year ago

Yes but using

#define SOME_MACRO  if(1){ \
 ...
 ...
}else{}

is identical to use a simple block statement, such as

#define SOME_MACRO  { \
 ...
 ...
}

since it does not oblige to use a semicolon after using SOME_MACRO as the do{}while(0) trick. That means that

if (...)
  SOME_MACRO   
else
  foo()

is OK (but adding a semicolon after SOME_MACRO would be incorrect).