JarrettBillingsley / Croc

Croc is an extensible extension language in the vein of Lua, which also wishes it were a standalone language. Also it's fun.
http://www.croc-lang.org
79 stars 12 forks source link

Assertion Failur in GC / Access Violation #64

Closed ligustah closed 10 years ago

ligustah commented 11 years ago

So I came across a very mysterious problem with my program. To do some benchmarking I have it create lots of objects implemented in native code (Sockets). After some very short time the program yields an access violation.

Some testing showed it problably was the Tango GC collecting my native object.

When I disabled the Tango GC I got this one: tango.core.Exception.AssertException@croc.base_gc(207): Assertion failure

I had a look at the code involved, but didn't quite understood what the problem was from looking at your GC.

So I'd like to know if this is some sort of rare bug, or if I'm just doing it wrong. How can I have Croc create native objects without Tango thinking they're unused?

ligustah commented 11 years ago

Ok, figured out the GC issue:

GC.addRange(getExtraBytes(t, -1).ptr, Members.sizeof);  //allocator
GC.removeRange(getExtraBytes(t, -1).ptr);                         //finalizer

These seem to do the trick with the Tango GC. Still running into that assertion though.

JarrettBillingsley commented 11 years ago

Okay: first off, given your solution to the problem, I can see exactly what you're doing wrong and why Tango was collecting your stuff. Extra bytes are completely invisible to the Tango GC and you cannot store references to D objects in them.. unless, of course, you make them visible to the Tango GC, like you've done here, lol. This isn't safe in general, though, and extra bytes will be going away soon, so you should really use the proper way of holding references to D objects that the Tango GC knows about: nativeobjs.

Second off, any time an assertion triggers, it's a bug. I have no idea what's causing it cause I don't have any code. Code please!

ligustah commented 11 years ago

Oh gosh, nativeobjs -_- lol, forgot about them. Should've had a look at the wrapping code I wrote the last time.

As for the code, I was hoping you had an idea what this might be caused by, so I could narrow it down to a simple test case. So if not, I'll try to get you something easy to compile that you can tinker with.

ligustah commented 11 years ago

Also I suppose it must have something to do with how I screwed your extra bytes API -_- So I'll first test if it goes away, when I do it right.

JarrettBillingsley commented 11 years ago

That's possible; if you were keeping a reference in D to a Croc object and then "resurrecting" it by putting it on the stack after it had been collected, its ref count would still be 0 and would then be decremented to -1 (which is the assertion that triggers). But yeah, can't say X)

ligustah commented 11 years ago

Ok, switching to nativeobj didn't quite help. I suppose it's somewhere in my socket code, because spawning the threads works, when I just have them do something like writing stuff to console.

https://www.dropbox.com/s/5z19xw9xtfo572m/croci-bug.zip

This is a package that you can use to reproduce this one.

ligustah commented 11 years ago

Ok, while trying to test my threads' sleeping functionality I hit another assertion. tango.core.Exception.AssertException@croc.base_gc(392): Assertion failure assert(slot.refCount != -1);

I removed all parts that would use my native module and ran the test again: tango.core.Exception.AssertException@croc.base_gc(207): Assertion failure

This is what I'm doing:

for(i : 0 .. 10)
{
    local s = math.rand(5 * 1000, 10 * 1000)
    scheduler.spawn(\{
        :sleep(s)
        writefln $ "went to sleep for {} ms", s
    })
}

scheduler.loop()

The spawn function will use thread.new with the supplied function and add it to the schedulers data structures. Sleeping threads are pushed to a very simple priority queue that is polled each cycle.

I also tried running the scripts with vanilla croci => crash.

Running the main.croc should trigger the crash. https://www.dropbox.com/s/oe83snf4jesarc0/scripts.zip

Guess it's your turn now :panda_face:

JarrettBillingsley commented 11 years ago

Hooo boy |O

JarrettBillingsley commented 11 years ago

Oh fuuuuuuh I fixed the base_gc(207) assertion. Three hours to find, one line of code to fix. Again. Just a little oversight in array.pop causing a reference count to be decremented twice.

I wonder if this fixes the other one, too, since both assertions check the same thing, it's just the one at 392 is inside the cycle collector.

JarrettBillingsley commented 11 years ago

Okay, tried building your croci-bug with the fix and it .. seems to run fine? I'm not sure what it's supposed to do. It just sits there. I guess you have a server that runs on your machine that this script is supposed to query?

I'll push the change and if it fixes your stuff, you can close this bug :)

ligustah commented 11 years ago

The program is supposed to spawn 10 threads that will sleep up to 10 seconds, then exit. If it doesn't crash, everything should be fine. I'll give it a test later.

ligustah commented 11 years ago

I was unable to reproduce the bug with your updates. Seems fixed.

JarrettBillingsley commented 11 years ago

Yaaaay!

ligustah commented 11 years ago

Oh my bad, the croc-bug program wasn't using the sleeping feature. It was just spawning 10k threads that would try to grab the default page of my local web server.

But it's working anyways :)

JarrettBillingsley commented 11 years ago

Heheh, yeah I was wondering about that.

ligustah commented 11 years ago

I do have a more or less functional non-blocking socket layer now. I can connect, I can listen. And I'm working on a fastcgi client written in Croc, so I can soon update my fast cgi project to drop the nasty libfcgi and most of the native code wheeee

JarrettBillingsley commented 11 years ago

That is so cool 8O I have no idea how non-blocking IO works so it'll be nice if we can refactor that stuff into a general-purpose non-blocking IO lib!

ligustah commented 11 years ago

I'd love to do that, though I think it might be better to go for something like libevent or libev, simply because writing stable and portable non-blocking socket stuff is tedious and error-prone. The great advantage of binding libevent would be that it could be easily ported to C++.

Also I had to do some modifications to your new streams library (mostly fixing errors with incorrectly named variables), and using coroutines for pseudo-blocking IO (i.e. threads yield out, when they would block and are resumed when IO is available) requires some sort of scheduling (which I implemented in script code). However, since scheduling is not only useful for IO, it might be useful to integrate any such scheduling mechanism with the standard thread library.

PS: It's close to working reliably now btw: https://gist.github.com/81357cfc6793c83b5f6f That's an excerpt of my program which basically opens a listening socket and then sends a HTTP request to itself via a locally running Nginx webserver that passes the request back to the program via FastCGI.

ligustah commented 11 years ago

Now things start going weird again (applause for DMD please!). I tried to benchmark my program and it worked pretty well, so I tried to simulate some work by having the request handler threads sleep (because they would finish in their first cycle otherwise). So when the program hit something like 200 threads I got that assertion again. Double-checked that I had pulled your changes and tried recompiling. Still assertion error -_- So I tried with a fresh clone of your repository (I used a slightly modified version before), but using that one Optlink crashed while linking (though that may be linked (pun intended) to my building system). Tried with updated DMD, fails with some compiler errors concerning implicit casts.

TL;DR: Successfully built with rdmd and still crashes with that assertion. I'll try to build another test case.

JarrettBillingsley commented 11 years ago

Any updates on this?

ligustah commented 11 years ago

Just rebuilt the test case that triggers the assertion. Which branch should I test against?

JarrettBillingsley commented 11 years ago

master.

ligustah commented 11 years ago

Some help on building appreciated: https://gist.github.com/Ligustah/b355f4a4370391884c95

JarrettBillingsley commented 11 years ago

Would you stop using the goddamn binding lib? XD Kidding, I totally forgot that it needs to be updated. On Feb 10, 2013 4:14 AM, "Ligustah" notifications@github.com wrote:

Some help on building appreciated: https://gist.github.com/Ligustah/b355f4a4370391884c95

— Reply to this email directly or view it on GitHubhttps://github.com/JarrettBillingsley/Croc/issues/64#issuecomment-13348027..

ligustah commented 11 years ago

What about the errors in stdlib_serialization.d ?

Also, NEVERRRRR

JarrettBillingsley commented 11 years ago

The serialization lib is the next thing on my plate. I've been meaning to split it up anyway, and it has to be updated to deal with the new classes.

ligustah commented 11 years ago

Can you update, when it compiles so I can test again? That'd be nice.

JarrettBillingsley commented 11 years ago

Of course!

JarrettBillingsley commented 10 years ago

I kinda dropped the ball on this one, but chances are it'd be hard/impossible to reproduce now... and the D implementation is going away. So.