franko / luajit-lang-toolkit

A Lua bytecode compiler written in Lua itself for didactic purposes or for new language implementations
Other
655 stars 91 forks source link

Weird bug #10

Closed q66 closed 10 years ago

q66 commented 10 years ago

Here's a snippet:

http://codepad.org/IWzsktIG

When you run it, the error should be

luajit: [string "core.lua"]:31: loop in gettable

When you remove 4 or so lines from the end (so that last one is for M.Foo96) it errors differently:

luajit: [string "core.lua"]:33: table index is nil

Or maybe it won't error, seems to be pretty random - in that case, try removing a different number of lines. The first error should be reproducible every time.

I'm not quite sure what's happening here, so I haven't patched it yet...

q66 commented 10 years ago

Oh, seems like the "loop in gettable" error is normal; but the "table index is nil" error is weird.

q66 commented 10 years ago

Somewhat simplified testcase: http://codepad.org/Ta5ShVBQ

This one seems to always fail, tried simplifying further but it didn't get any decent results.

franko commented 10 years ago

Hi Daniel,

I was quite busy this weekend but I took some time to look at this problem and unfortunately I don't have any clue of what is going wrong.

I did reduce a little bit your testcase and I have a version that show the error only in some runs but not always. Also adding a print instruction can make the problem disappear, this is really obscure for me.

I didn't find any error in the generated bytecode. I suspect a problem in the constants data bytecode generation. At work aI have a tweaked version of luajit that prints more informations about the bytecode. I will give a look at work tomorrow.

franko commented 10 years ago

Interestingly if you generate the bytecode using

luajit run.lua bug-testcase.lua out.raw

And then you run directly the bytecode:

luajit out.raw

the problem disappear.

I suspect a luajit bug and may be I can submit a bug report to the mailing list. May be that, even if it lot a LJ bug Mike can help us to debug this problem.

Yet before submitting an email it is wise to compile luajit with debug on and see if any error is reported but probably I will not have the time of doing that this evening.

q66 commented 10 years ago

i would say the reason the bug "disappears" when you save bytecode and run it later is that the environment is different (i.e. lj-lang-toolkit modules are not loaded) and it would still fail if modified. No idea though...

q66 commented 10 years ago

the reason i'm saying that is that when I reduce run.lua to just compiling and running the file, I can get the error to disappear. Then by adding print I can get it to appear again. So I would say it's largely dependent on the environment you're running it in.

franko commented 10 years ago

Now it seems more probable that the bug lies on the luajit side. I've just discovered that it disappear by using "-joff":

luajit -joff run.lua heisenbug.lua

Now I'm thinking about running luajit withh full debug turned on and afterward file a bug report. The main problem is that normally Mike want to have a simple test case that show an evidence of the problem and this is hard to create. The testcase would need to include all the files used by the lang toolkit.

May be I can just let run the bytecode generator by directly providing the AST tree. In this way I would exclude the lexer, the parser and the AST builder from the testcase.

q66 commented 10 years ago

problem is, as i said, the bug is highly environment dependent, so different environment (differently allocated memory etc. by not actually lexing etc.) will make the bug disappear etc.

franko commented 10 years ago

and so ?

I'm sorry but it is not clear to me what you mean.

q66 commented 10 years ago

which means if you do it like this (provide AST directly, without running lexer and parser) the bug might not be reproducible anymore :P but you can give it a shot, of course.

franko commented 10 years ago

Of course it is an attempt. The bug may disappear as you are pointing out but if I want to have a chance that Mike give a look at this problem I need to reduce the amount of code needed to reproduce the problem

franko commented 10 years ago

You was right, I made the test and if I give to the bytecode gen directly the AST tree the error does not happens.

At least this suggest that the bytecode gen is probably doing everything right.

The problem is that now it is very difficult to submit this kind of bug. Any idea is welcome! :-)

q66 commented 10 years ago

i have no idea :/ it's an odd issue and sadly a blocker for me and I can never reproduce it with standard LuaJIT. It's really odd.

franko commented 10 years ago

Don't be afraid for that :-)

I'm sorry that I'm not able to solve this problem and I hope you didn't rely too much on the lang-toolkit for your project.

Just for your information. The bug appear with another test, and this is true since a long time:

expr-var-return-1 pass fft-1 fail for-statement-1 pass

and it does have the same signature. It does appear randomly and disappear with -joff. In this case the bytecode produced by the lang toolkit is identical to those produced by luajit.

A long time ago I've submitted a bug report:

http://comments.gmane.org/gmane.comp.lang.lua.luajit/4642

and Mike did something but the problem is still there. Right now I've posted another message since with a little trick I was able to reproduce the problem with a simple testcase.

With some luck the problems are related and Mike will fix our heisenbug :-)

q66 commented 10 years ago

I can't reproduce the bug using the fft test (even when calling twice). What I noticed though is that i'm getting different output for LuaJIT and lj-lang-toolkit.

franko commented 10 years ago

I've a similar problem, I can reproduce the bug on linux but not on windows even if both of them are x86.

Let us see if Mike is able to reproduce the problem and accept the bug report.

q66 commented 10 years ago

Btw, i might actually have an idea... the version of LuaJIT I'm testing things with is stock 2.0.3, released 03/12/14, while the issue was fixed by Mike 03/27/14. I'm gonna test with more up to date version from Git master as well as 2.1.

q66 commented 10 years ago

tested with latest git master as well as 2.1 (had to did some minor updates to bytecode.lua to make it work) and all have the same issue...

q66 commented 10 years ago

We have a reply from Mike, who hinted at a possible bug in the bytecode generator that explains occasional crashes AND that it doesn't fail with interpreter. Would possibly be worth looking at, however I won't today until late evening at least... won't be around much before then.

franko commented 10 years ago

Hi,

thanks to the remarks of Mike I now clearly undestand the problem. I've began to fix this stuff and the problem with your testcase actually disappears. Now I need just some time to consolidate the needed changes. I need to make sure that the new implementation is sound in all the possible cases.

franko commented 10 years ago

Hi Daniel,

this problem is finally fixed in the master branch. The commit that fix the problem is:

https://github.com/franko/luajit-lang-toolkit/commit/3ac715b7ef469bc6a7f516c916d2b8e28a0b54cb

I am quite satisfied with the results, the code is may be even more clean that before and I think the solution I implemented is quite sound. The only complication was that I was forced to introduce a new arguments in the function signature of the rules so I've got to change the code in a lot of places.

I have also introduced a test that is able to reveal the bug:

https://github.com/franko/luajit-lang-toolkit/commit/cfa2fa487b921cb11a7d374491c024893aabbec1

Let me know if you can test the fixes I've done and I will close the issue. Thank you for your support to address this delicate problem.

q66 commented 10 years ago

I'll give it a shot and let you know.

q66 commented 10 years ago

Great job. Works very nicely :) No errors when compiling my engine anymore.