Neos-Metaverse / NeosPublic

A public issue/wiki only repository for the NeosVR project
193 stars 9 forks source link

Crash on startup on Linux #2702

Open iamgreaser opened 3 years ago

iamgreaser commented 3 years ago

Describe the bug?

Neos will sometimes randomly crash on startup. Sometimes starting it up afterwards will result in me being logged out, although that didn't happen this time around.

Relevant issues

Nothing with the Crash Report tag seems relevant.

To Reproduce

Start it up on a Linux system, roll some dice, and get unlucky.

Expected behavior

It shouldn't crash.

Log Files

flamethrower - 2021.7.24.632 - 2021-07-25 09_11_31.log Player.log

As an added bonus, here's a Tcl script for those of you who use Linux and want an easy way of grabbing logs post-crash... copycrash-tcl.txt

Screenshots

No response

How often does it happen?

Sometimes

Does the bug persist after restarting Neos?

Yes

Neos Version Number

2021.7.24.632

What Platforms does this occur on?

Linux

Link to Reproduction Item/World

No response

Did this work before?

No

If it worked before, on which build?

No response

Additional context

No response

Reporters

Just me, GreaseMonkey, but it's probably been reported on Discord.

(Before you ask me to join Discord, my account was locked due to bad RNG and I'm not giving them my phone number nor am I about to ban evade, so the answer is "no".)

Frooxius commented 3 years ago

The crash seems to occur in the GC when trying to run a portion of the garbage collection to reclaim memory. I'm not quite sure what could be the cause of that. Can you test your system memory for defects just to be sure?

Unfortunately this is a bug in the Mono runtime and the GC used by Unity, so it's something we can't fix ourselves unfortunately, but would have to report to them.

Are there any crash dumps generated? If you could collect a few that might help as we could pass those over, but we'd probably have to upgrade to the latest LTS first and make sure it doesn't occur with that.

It'd also be useful to know if there are other Linux users affected by this, I haven't seen a report like this before, so it might be something specific to your system too (e.g. memory defect or something about the interaction with your particular distro).

iamgreaser commented 3 years ago

I'm not sure where the relevant dumps would be or how to actually generate them. I don't see any *.core files under the relevant steamapps dir.

I'll have another go at trying to get memtest86+ running when I get home from work, last night it wouldn't show anything on the screen so I suspect I'll have to somehow get it running w/ a legacy BIOS boot.

In the meantime, if it's in something you have control over, it could be a use-after-free issue from a non-GC-managed pointer, e.g. doing this in the Lua C API can result in things breaking:

something_t *p = lua_touserpointer(L, 1);
lua_pop(L, 1);
p->touched = true;

Use-after-free is a pain to debug, of course.

Some specs:

Frooxius commented 3 years ago

I'd check around Neos' temp/cache folder for the crash dump, that's typically where they are saved on Windows, but I'm not too sure if they're generated on Linux, since the Player.log usually mentions their location at the end.

Neos' code is mostly managed C# code so it doesn't deal with pointers. We do have a number of native libraries, but we didn't have any signs of them exhibiting issues like that before, so I wouldn't think it's any of them causing it, it's something that would likely show up on other platforms and users too.

Seeing if it affects more Linux users would help too, as well as potentially getting more stacktraces to see if they happen in different places.

The stack trace seems to occur within GC marking the heap, so I think it's most likely issue in that part.

iamgreaser commented 3 years ago

By temp/cache dir do you mean ~/.config/unity3d/Solirax/NeosVR/Assets? Because I don't see anything resembling a crash dump in there.

Enverex commented 3 years ago

Neos uses the system temp folder on Linux, so it would be in /tmp (also if /tmp is a RAM drive on your system like most Linux installations, it's possible Neos itself is killing itself/your PC because it'll store assets in RAM until you run out of RAM entirely and everything dies). This default behaviour needs to be changed but that's another ticket.

iamgreaser commented 3 years ago

OK, around about the time of my last comment I had run memtest86+ 5.31b (one single-core pass, one multi-core pass) and confirmed that at least individually the sticks of RAM were most likely good. I set my BIOS up to use "Enhanced Stability" memory settings and things were looking pretty good other than a completely different crash which is the sem_wait() or sem_timedwait() failed crash that has appeared in #1320 . I have that workaround applied and enjoyed I think about 3 days of Neos simply not crashing at all.

Well, first bootup for today and I just got the startup crash again.

I'll post the logs and follow up with a bit more detail but it seems to be crashing in the same place as it did 5 days ago.

flamethrower - 2021.7.25.489 - 2021-07-30 16_50_37.log Player.log

...

Following up: OK, I'll need to gather more of these as they happen. There are assets in common being fetched between the two crashes, and assets different between the two, and the stack trace in today's crash is really not that helpful.

I will add that this time I did need to log in again.