connor-brooks / ecosim

An interactive ecosystem and evolution simulator written in C and OpenGL, for GNU/Linux.
GNU General Public License v2.0
389 stars 12 forks source link

Segmentation fault when using Nvidia drivers #1

Open GOKOP opened 4 years ago

GOKOP commented 4 years ago

I've cloned the repo and built it with make, but when I try to run the executable, it segfaults:

$ ./ecosim
[1]    28758 segmentation fault  ./ecosim

Logging python script appears to be broken as well, unable to find module logger_data. It doesn't sound like some actual python module that can be installed (and duckduckgo search agrees) so I assume it's supposed to be a part of this software but it's not here

$ ./ecosim_with_log.sh 
Starting Ecosim...
./ecosim_with_log.sh: line 4: 30315 Segmentation fault   ./ecosim
Starting logger plot
Traceback (most recent call last):
  File "./logger_plot.py", line 7, in <module>
    import logger_data
ModuleNotFoundError: No module named 'logger_data'

As a side note, I had to change the python script's shebang to #!/usr/bin/env python3 because python3.5 is not a thing on my system, I imagine I'm not the only one

Should I make this into two issues? Cause now that I think about this these are two separate problems but it feels kinda weird

connor-brooks commented 4 years ago

Logging python script appears to be broken as well, unable to find module logger_data. It doesn't sound like some actual python module that can be installed (and duckduckgo search agrees) so I assume it's supposed to be a part of this software but it's not here

The logger_data module is generated by the main simulation whilst running, but because the simulation is segfaulting this file doesn't exist yet.

As a side note, I had to change the python script's shebang to #!/usr/bin/env python3 because python3.5 is not a thing on my system, I imagine I'm not the only one

Oops, my bad. I'll make the change right now.

I've cloned the repo and built it with make, but when I try to run the executable, it segfaults:

This is interesting. A few people on HN were having a similar issues. It seemed mainly people with Nvidia graphics. May I ask what distribution and graphics you have?

Cheers

GOKOP commented 4 years ago

Artix Linux (basically Arch without systemd) and yes, Nvidia

GOKOP commented 4 years ago

When I run it on my thinkpad (so no nvidia) the program runs and even displays some output (lines "food added" and "proned") but the window is all green and logger_data is still missing

connor-brooks commented 4 years ago

Artix Linux (basically Arch without systemd) and yes, Nvidia

It seems there is some issue with GlDrawElements() and Nvidia. I'll need to investigate this further before fully understanding why. Thanks for letting me know about this.

When I run it on my thinkpad (so no nvidia) the program runs and even displays some output (lines "food added" and "proned") but the window is all green and logger_data is still missing

Which ThinkPad was it? The green screen indicates that the FBO isn't being rendered correctly. The wobbly-jelly kinda graphics work by rendering the whole simulation offscreen to a frame buffer object, then distorting this using a shader. This distorted image is then used to texture a rectangle which spans the whole screen. If this shader fails then a green fullscreen quad spanning the whole screen will be displayed. I believe FBO's were only included in OpenGL 3.0, so this makes sense for any older ThinkPad. Ecosim was developed on a ThinkPad T420 running Devuan (Debian without systemd)

My apologies for these issues. In the near future the whole simulation is going to be ported from GLFW to SDL2, which should be easier to ensure it works on various machines.

sethalves commented 4 years ago

me too Program received signal SIGSEGV, Segmentation fault. 0x00007ffff60a4c53 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21 (gdb) bt

0 0x00007ffff60a4c53 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21

1 0x00007ffff617d766 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21

2 0x00007ffff5cf666d in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21

3 0x0000555555558108 in gfx_agents_draw_cell (av=0x5555558c9810, shader=6, scale=1.66666663, zoom=1) at graphics.c:371

4 0x000055555555b4fc in main (argc=1, argv=0x7fffffffdd18) at main.c:238

ubuntu 19, GeForce GTX 1070

GOKOP commented 4 years ago

I believe FBO's were only included in OpenGL 3.0

That would explain it. I've ran into issues on that Thinkpad already that made me discover it doesn't support OpenGL 3.0 (It's an x200)

harleypig commented 4 years ago

metoo

Arch Linux (updated last Saturday) 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

I tried to attach an strace output log, but kept getting 'something went really wrong' ...

connor-brooks commented 4 years ago

That would explain it. I've ran into issues on that Thinkpad already that made me discover it doesn't support OpenGL 3.0 (It's an x200)

In the near future I will add an option in config.h which disables the FBO. You wouldn't get the jelly-like graphics but it should be able to run okay. It will feel very mechanical as opposed to organic, but should work.

metoo

Arch Linux (updated last Saturday) 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

I'm guessing the problem is caused by an issue in gfx_agents_draw_cell(), which calls glDrawElements(), causing the segfault. It seems a common issue with people using Nvidia graphics. At the moment I'm unable to understand exactly why (I have no access to a Nvidia machine), but I'll investigate.

Thanks for the feedback :)

Muffindrake commented 4 years ago

I segfault with Nvidia's OpenGL implementation. Mesa with Intel Graphics is fine (and I suspect nouveau users would be fine). Easily testable for users with hybrid graphics thanks to NVIDIA's actual Optimus support.

(gdb) bt
#0  0x00007ffff64d9143 in ?? () from /usr/lib64/libnvidia-glcore.so.440.59
#1  0x00007ffff65bd8c6 in ?? () from /usr/lib64/libnvidia-glcore.so.440.59
#2  0x00007ffff61355bd in ?? () from /usr/lib64/libnvidia-glcore.so.440.59
#3  0x00005555555576c7 in gfx_agents_draw_cell ()
#4  0x000055555555a9ef in main ()

Intel(R) HD Graphics 530 (SKL GT2) + NVIDIA Corporation GM107M [GeForce GTX 950M]

connor-brooks commented 4 years ago

I segfault with Nvidia's OpenGL implementation. Mesa with Intel Graphics is fine (and I suspect nouveau users would be fine).

Thanks for helping clarify that @Muffindrake

GOKOP commented 4 years ago

So is this program abandoned?

connor-brooks commented 4 years ago

So is this program abandoned?

Not abandoned.

I've tried getting to the root cause of the bug but haven't managed to. I will be porting the simulation over to SDL2 at some point soon, the segfault will be fixed then.