Closed phoxmeh closed 10 years ago
Thanks for the report.
Does this have the same stacktrace whenever it happens (i.e. the last Java frame being com.naef.jnlua.LuaState.lua_isthread
) or does that vary?
Also, could you check the used native lib is the 'right' one (...-native.64.so
from the looks of it)? The code to determine which one to use got so bloated I decided to just have it try each one until one works, but it may be that causes unexpected side effects... have you seen this in an OC version pre 1.3.0 / build 505?
Oh, also, to narrow down the source of the issue a bit, could you try to reproduce it after changing the following config options (ideally one by one, to narrow it down even further, but feel free to start with enabling all to see if any of them do anything) - disableUserdata
, disablePersistence
, disableMemoryLimit
. As the names indicate (and the comments in the config), these lead to reduced functionality, but it'd help me a lot to get an idea of where to start looking. Thanks!
I got the same problem running a dedicated server on ubuntu 64-bit (computers - most of the time - crash the server with a sigfault when turned on or loaded on server start). Enableing LuaJ fallback solves it. I will try these settings in a minute.
Update: Disabeling persistence seems to solve the problem... (after several restarts of the computers and the Minecraft server it crashed not a single time)
It looks like a lot of them end at com.naef.jnlua.LuaState.lua_newstate
.
I've tried disabling disableMemoryLimit
but not the other two.
The Library seems to be the correct one loading.
This has been an issue since 1.3 (although I didn't have any versions before that on a server)
@XDjackieXD And the persistence is my favourite features too! XD
Also disabling the persistence has not helped me. I've tried disabling userdata and the memory limit (and all at once) and I get the same result.
@phoxmeh yeah persistance is a really cool feature...
@fnuecke when having persistanec enabled and loading a world with a computer in the on-state the error happens at "com.naef.jnlua.LuaState.lua_newstate(IJ)V+0" (according to the hs_err_pidxxxx.log)
If this is reproducible for a certain world, would it be possible for you to send me either the <savedir>/opencomputers/state
folder or the whole world? Maybe I can reproduce it with that and get some more info out of the crash. Thanks!
Mine still crashes unless I force it to use LuaJ :/ which is strange since it was working perfectly fine with the native library for some time before I went to reboot the server.
I don't have any data under any of the .../opencomputers/state
sub-directories. My current world is about 900+MB uncompressed. I could compress it to an archive you prefer (unless you don't mind a tar.7z archive since that's how I backup) and send you a copy.
I don't have any data under any of the .../opencomputers/state sub-directories.
That's interesting, actually, since that points to something being broken in JNLua, not in Eris (which might still be my fault, ofc [/disclaimer]). The world should be pretty irrelevant then, but thanks for offering! I'll try rebooting my test server a bunch of times, will see if something pops up.
Otherwise I may have to generate some custom builds so you can test with those. We'll see. Thanks for your patience!
I will test it tomorrow using a clean installation of OpenComputers on Minecraft 1.6.4 (I am currently testing using the latest experimental snapshot of the yogscast pack)
And btw: In my state folder there is an empty folder called "0" but nothing else...
Update: Can't reproduce it using a fresh install of OC 1.3.2.... (Also I don't get the crash in my old world using the yogscast pack anymore.... strange....)
Karma hit me... :D The second I updated my previous comment the error hit me again (Yogscast Pack OC 1.3.1.516)... Disabeling persistence fixed it again.
Thanks for that log. Though the persistence setting shouldn't have any influence at that point, anything's possible with segfaults :-P While I'm kind of expecting that to be by chance, I'll see what I can find. I haven't been able to reproduce it, yet, though, so it's kinda purely going through whatever I can think of mentally right now... I'll try with the full pack when I have the time, just in case.
I cannot reproduce it using a clean install... Idk why but it may be related to anything in this pack. But disabeling persistence definatly fixes the crash (reproduced it several times using the latest yogscast pack)
All right. In the worst case it's something triggered by the garbage collector running more frequently because of the higher memory load from the pack... that would be fun to debug...
I'm using The Crack Pack, not sure what mods are shared between it and the Yogscast pack. But I do know it's very inconsistent when it decides to crash for me, albeit now it crashes always. I did find that when someone started spamming world anchors around it was crashing more often (I promptly disabled them). Also it seems to crash more reliably when I'm running both the client and server at the same time. Right now it just seems to always crash when using the native lib, if the computers are on when using the native lib it will crash, just usually quicker or immediately when I'm running my client too. Haven't tried it in a day so it might decide to work for a couple days cause that's what it seems to do for me.
Eh, it's probably less the packs but their effects on the runtime (having tons of classes loaded, using lots of memory). That's my guess, anyway. Which Linux distros are you running, by the way?
I'm running an arch64 install.
I'm running Ubuntu 14.04 64bit desktop.
In my case the crash occurs about 60% of the times i turn on a Computer and everytime a chunk with a turned-on Computer gets loaded (until I disable persistence).
here a small suggestion: enable core dumps using ulimit -c unlimited
on the terminal, as the log suggestes. next time it crashes you will get a file named something with core in your ~/.minecraft (for servers in the server dir). Send that file to @fnuecke. It contains debug information he can use to track down the bug. He will probably send you a debug build without stripped debug symbols in a bit, so wait for that. We already talked about that in IRC
https://mega.co.nz/#!OFcmASKY!XZUvz1j7jeVn8Z-p700LYMrqAwVNG51TDpDCrrsMfeY 275MB of compressed coredump goodness :D
So my core dump is 2.9GB.... this might take a while XD edit: so it compressed better than i thought so hopefully i should get it uploaded... wish my upload speed was better
Awesome, thanks! I'll see what I can glean from that :-)
https://copy.com/idJj28NApPvl sha1sum: 3b4864c45f01b3d11708b901aa4fcd884ce412dd
there is mine, hope that works XD
Thanks! I'll probably need to set up a VM with arch to make use of @phoxmeh's coredumps, but at least I have the Ubuntu VM at a point where everything "environmental" fits the coredump, from what I can see. So: slow progress, but progress.
I've been running the Yogspack on the Ubuntu VM for ~12h now, with one computer being stopped and then started again every ten seconds, but no crash yet, so I'm afraid I'll have to rely on you for helping me solve this for a little longer.
I'll have to have you both use this custom built version with debug libs to get anything useful out of the dumps, though. Please switch out the OC JAR in the packs with this one and get me a coredump, that should tell me a little bit more then! This was built on the Ubuntu VM, so I'd primarily need @XDjackieXD to get me a coredump with this. @phoxmeh you could at least verify it still happens with that build, you can probably save yourself the time and bandwidth of uploading a coredump until I get an arch VM up and running :-P
Obligatory warning: the custom build is based on the dev branch, so make sure you have a backup of your world, just in case.
Thanks again for your patience and cooperation :-)
https://mega.co.nz/#!OVkVwRxZ!q_phLqlZ7pc__oPCtNIYTF6nAF3yaQTMaaErT5_bbao
Here you have the coredump and error log (wich gives a little more detail this time :) ) At least the crash is reproducable over different versions of oc...
I did it anyways :P https://copy.com/4IruU8pIGhgH sha2sum: d67280f8e2403b41856c2a544bec60af94ff63b1
No idea what I changed on my setup (could a kernel update have something to do with it?) but now the crash can only prevented by enabling LuaJ fallback...
Hmm, maybe? I had a look at the debug dump, and I'm afraid it's not that helpful :-( The crash itself, from what I can tell, happens inside some magic code that takes care of thread-local variable assignment (it happens on this line, which is really just this; all stack frames after that line are just question marks, even in the debug build).
The main hindrance for me is that I still can't reproduce it, so I can't try to catch the library screwing up red-handed. But I'll keep digging and let you know when I have some new thing to test out.
Actually, one thing you might try, since it seems to be related to thread-local storage at least in some capacity, is to set the number of worker threads to one (from the default four). Yes, that's just a wild guess...
As when speaking of mamy classes, could raising Permgen help?
Very unlikely.
Switching it to 1 worker thread seems to solve the problem after several server restarts and half an hour playing...
@XDjackieXD to clarify, it didn't crash at all across these restarts? Or it didn't crash after the most recent restart? @phoxmeh could you see if this also applies in your case (i.e. setting worker thread count to 1 avoids the crash)?
If this does indeed help, it means a bit of work, but at least I'd have something to try out.
It doesn't crash at all after setting it to 1 (I played about 1h and made a few restarts with 2 computers turned on and I switched them on and of and nothing crashed)
Yep, absolutely no crash so far. I'll keep it going for a while, some people should get on a bit more later and if it doesn't crash I'll let ya know but so far it's working fine.
Fingers crossed. At least it's a non-catastrophic workaround for now. I'll dig some more through the JNLua code over the weekend and see if I can maybe get rid of the thread-local stuff altogether in a hope of solving this.
All right, got rid of the last use of thread-local variables, had it running over the night with no issues to make sure I didn't mess up something else. @XDjackieXD, @phoxmeh please give this version a try and let me know how it goes. Don't forget to up the threads to 4 again. Thanks!
Im not at home for 2 weeks so I wont be able to test until sunday in 2 weeks. Also I got a new pc on friday so I have a second computer with ubuntu 64-Bit to test with (and a lot faster than in my laptop ;D ).
Well it's not crashing now (after remembering to turn off the computers before starting up with the native lib and threads set to 4) but now it's telling me oc:native libraries not available
with http://pastie.org/private/qnvbcggte7whxavltmpw in console
edit: ignore that... after a little tinkering it works fine...
@XDjackieXD well, on a different machine it might behave differently. @fnuecke wasn't able to reproduce it on Ubuntu himself afaik
@phoxmeh hmmm. Could you post the full log / some more context around it, just in case?
The main thing i did was remove the computers before restarting the server again. Once I did that it worked just fine. Nothing else was reported in the logs as far as I know but I'll be more than glad to upload that forge log if you need it.
Oh, OK. Hmm, keep an eye on it, then, please. If after another re-start the above happens again, let me know! (And I'll have another look over the changes I made to see if anything could cause that.)
Well it seems to be failing still @_@ it crashed with this http://pastie.org/private/0k5ipygblrafwghwdkplq and this http://pastie.org/private/xfytvaxg3bbb8j9thyvtxw so far
Darn. Well, it was an expected possibility. If you can, would you mind getting me a coredump for this latest build? I'll finish setting up the arch VM in the evening, hopefully I'll be able to get anywhere with that.
Also, just to make sure, the debug settings mentioned above (disable*
) still don't have any influence on whether it's crashing or not?
Ok, gonna try to test it out a bit more, sorry I've been quite a bit busy at work and unable to really do any minecraft stuff the past couple days. Currently I'm running the dev version with nothing disabled right now and threads at 4, no crashes yet except for the library failing to load when i first chagned it to the dev version and a small crash when i was shutting it down. So far nothing, gonna try some restarts and play around with it. I'll let ya know how it's running tomorrow.
So I've been testing the server a bit and it has been stable with no crashes and I just went to restart it and it started crashing again like the last time I had issues (running the dev version you gave me). Tried diabling things and setting threads to 1 but that all failed. Only after restarting with LuaJ forced and removing the computers and restarting twice (cause the first time it doesn't load the native lib properly) does it seem to work with everything enabled and threads at 4. The computers are left on when I shut down the server initially, but it doesn't always crash just usually after the server has been up a couple days. During the uptime it's perfectly fine but when I try to start it again it crashes with the same segfault as http://pastie.org/private/0k5ipygblrafwghwdkplq I'll try to reporduce it again and get a core dump (always forget to enable it to get that before I do anything @w@) Lemme know if there's anything else you want me to try and do to figure this out. I'll keep tinkering how I have been to see if i can get that core dump for you.
Thanks so much for taking the time to test all this! I'm afraid I didn't have the time to test as much over the weekend as I hoped I'd be able to, but someone brought up an interesting issue that might be related (garbage collection, which might explain why it only happens for existing / resumed computers). I'll investigate this as soon as possible, probably tomorrow.
Side note: if we can't find a robust solution/workaround for this in the next couple of days I'll probably still push out 1.3.3 with this as a pending known issue, due to all the other fixes that have accumulated...
@phoxmeh when you have the time, could you please give the latest dev build a try? I'm disabling the Lua GC while persisting / unpersisting now, since that reportedly helped with the other issue I mentioned, so that may help with this, but may just as well be unrelated.
Sorry it's taking me so long work on things, it's been quite the busy week. I got the lately build today, 601 I believe, and it's crashing regardless of settings unless I force LuaJ. I'd removed the computers before doing anything so to make sure they were off and not saved in the world as to casue any problems. But it's crashing the same as it was in the beginning and at the same adress space (0x0...99a6) That last dev build you sent me was working flawlessly until I had to do reboots. I'll keep trying to see if I can get it to launch the computers with the current build though. Just lemme know what else you need me to do.
here is the hs_err: http://pastie.org/private/a727keh23unsuwswxnxhpa
No problem, thanks for helping out with this! All right, so it's most likely not related to that other issue. I guess that's kind of good. But it doesn't really get us closer to a solution, either. Hmm. Which version of libc do you have installed? (ldd --version
)
GNU libc 2.19
I can reproduce this a lot and it's quite inconsistent sometimes. I'm on linux running the latest java version (also tried previous versions back to 7u55) and I get this seg fault on the server I run. It seems to mainly happen only when I'm running any other minecraft server and/or client on the same machine. As expected everything works when forcing LuaJ but when using the native interpreter it causes an inconsistent crash (works for a few days and then constantly crashes and then stops crashing and works again with the native interpreter) I'm running it with other mods from the CrackPack on the atlauncher. It always reports "SIGSEGV (0xb) at pc=0x00000000000099a6"
Here is a link to the hs_err I got when trying to get it working last time: http://pastie.org/private/w3mgfabfkivvqzcab8bgcg I've tried both openjdk and oracle java, both have the same issue.