Closed jpellegrini closed 1 year ago
I tried compiling with only DEBUG_VM
and it fails. So it was no the STAT_VM
code, but the debugging code. I'll keep looking into this.
Can't see what's wrong with the DEBUG_VM
code either.
Is it OK if I send a PR splitting DEBUG_VM
from STAT_VM
? That may make the two easier to deal with.
Hi @jpellegrini ,
I am not too surprised that code compiled with DEBUG_VM
or STAT_VM
no longer works. I have not activated these flags since a long time, and things have changed probably a lot since last time they were used. Feel free to split DEBUG_VM
from STAT_VM
if it makes things easier for you.
BTW, having an idea of the instruction timings would be really awesome.
Thanks @egallesio ! I'll try to get both debug and stats working.
It doesn't seem to be DEBUG_VM
either. From what I can see, STklos only compiles properly with USE_COMPUTED_GOTO
. But I can't see why.
If you comment out the part in which USE_COMPUTED_GOTO
is defined:
//#if defined(__GNUC__) && !defined(DEBUG_VM)
/* Use computed gotos to have better performance */
//# define USE_COMPUTED_GOTO
//# define CASE(x) lab_##x:
//# define NEXT goto *jump_table[fetch_next()]
//#else
/* Standard C compiler. Use the classic switch statement */
# define CASE(x) case x:
# define NEXT continue;/* Be sure to not use continue elsewhere */
//#endif
Then STklos fails to compile. When the generated stklos binary is called, it tries to look up a variable, but instead of passing a symbol to STk_lookup
, it passes some other object, which does not have the proper SYMBOL_PNAME
, and crashes. This is with plain STklos code from git (the only change was that one I showed above).
Anyway, I have made changes that gather instruction count and timings, but they only work with the computed gotos version. I'll make a PR later.
I've made a PR (see #530) -- the stats code works fine, but only for the version that compiles (the one with computed gotos).
I tried compiling plain STklos (as in the previous comment, removing the USE_COMPUTED_GOTO
definition) with both GCC and Clang, and both fail exactly at the same place, with the same backtrace...
The first two opcodes are 37 and 86 (PREPARE_CALL
AND GREF_INVOKE
). Then it crashes. I think it doesn't get a symbol for the procedure name that it's invoking.
The crashing version of the code will start executing these bytes:
37 86 00 00 31
while the good (USE_COMPUTED_GOTO
) version executes this:
38 01 47 00 16
So the bytecode that is being executed is different!
The strange part is that boot.c
always agrees with the version that crashes!
STk_instr STk_boot_code [] = {
0x25,
0x56,
0x0,
0x0,
0x1f,
...
So the bytecode that is being executed is different!
The strange part is that
boot.c
always agrees with the version that crashes!
Which makes me think that the VM is actually not reading, or perhaps not interpreting the bytecode properly...
I don't have much time now, but I'll try to take a closer look into this in the following days.
I have been trying to understand what happens, and this really is a mystery to me. I have read the vm code over and over, and used several debugging ideas, but I can't see why the code executed with USE_COMPUTED_GOTO
is not the same executed without it. It must be something related to how instructions are decoded, but looking at the code, it doesn't seem that it can be the case.
Not sure what to try next... Perhaps you can spot the problem more easily, @egallesio ?
Ah. Version 1.70 does not have this problem! I'll spend some time in the next days compiling intermediate git checkouts to see if we can find out what happened.
Aha! It's commit 7fb4042e5104f8bf8a6763f98633cf691348b0d7, Fix multiline macros.
It seems that if we remove the do{...}while(0)
guards from:
NEXT0
NEXT1
LOCK_AND_RESTART
then it woks. But those guards are there for a reason... And I'm not sure why they're causing this trouble.
Aaaah, yes!
# define NEXT continue;
Then we have, in LOCK_AND_RESTART
, a NEXT
. This should be the continue
for the VM loop, but we introduced a do
, so now the continue
will work for the do
statement, not the VM loop... And the semantics of it all has changed!
But I'm not sure what would be the best way to fix this...
@egallesio -- an ugly solution would be to not explicitly use continue
, but have a label at the end of the loop, and goto
there... Is there a nicer way?
Perhaps using
#define SOME_MACRO if(1){ \
...
...
}else{}
would work? (I didn't think too much about it but it seems that it wouldn't cause problems)
Slightly uglier than the do{
thing, and not standard practice, but I guess the #define NEXT continue
should stay as it is, no?
Nice investigation Sherlock :smiley:.
I'm not very found of the version using if/else
, since it is not "standard". Using a goto
at the top of the loop in a VM is not shocking, IMHO (moreover, it is hidden in the NEXT macro).
Ok @egallesio !
I'll try to make a new PR tomorrow then, with the goto
! :grin:
since it is not "standard".
I agree, but it seems like a strange standard, since it's fragile, as we ourselves found out... :)
Yes but using
#define SOME_MACRO if(1){ \
...
...
}else{}
is identical to use a simple block statement, such as
#define SOME_MACRO { \
...
...
}
since it does not oblige to use a semicolon after using SOME_MACRO
as
the do{}while(0)
trick. That means that
if (...)
SOME_MACRO
else
foo()
is OK (but adding a semicolon after SOME_MACRO
would be incorrect).
Hi @egallesio !
I have tried to build STklos with VM statistics, and it segfaulted.
The result was:
Running
gdb src/stklos
shows this backtrace:My plan was to add timings to the statistics. I see you have already written code to compute the frequencies of instructions and pairs of instructions. I wanted to add a
clock()
call there too, and create a timing table for instructions, but the current code doesn't compile withSTAT_VM
. And I have reviewed the code, but couldn't find anything wrong in it. (?)