PMunch / futhark

Automatic wrapping of C headers in Nim
MIT License
357 stars 19 forks source link

SIGSEGV when running opir #11

Closed planetis-m closed 2 years ago

planetis-m commented 2 years ago

I have installed raylib globally as per https://github.com/raysan5/raylib/wiki/Working-on-GNU-Linux#build-raylib-using-make My nim code is:

import futhark

# Tell futhark where to find the C libraries you will compile with, and what
# header files you wish to import.
importc:
  absPath "/usr/lib/clang/13.0.0/include/"
  absPath "/usr/local/include"
  "raylib.h"

# Tell Nim how to compile against the library. If you have a dynamic library
# this would simply be a `--passL:"-l<library name>`
{.passL: "-lraylib -lGL -lm -lpthread -ldl -lrt -lX11 -DPLATFORM_DESKTOP".}

# Use the library just like you would in C!
proc main =
  InitWindow(800, 450, "raylib [core] example - basic window")
  while not WindowShouldClose():
    BeginDrawing()
    ClearBackground(RAYWHITE)
    DrawText("Congrats! You created your first window!", 190, 200, 20, LIGHTGRAY)
    EndDrawing()
  CloseWindow()

main()

The output I am getting when running opir -I/usr/local/include -I/usr/lib/clang/13.0.0/include ~/.cache/nim/makeray_r/futhark-includes.h:

Traceback (most recent call last)
futhark/src/opir.nim(343) opir
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
(segmentation fault) (core dumped)

Any way to debug this further?

PMunch commented 2 years ago

Hmm, not entirely sure. Running the exact same code on my machine works fine, albeit erroring out on something else.. Which clang package do you have installed? And I assume you have the right glfw backend installed?

planetis-m commented 2 years ago

I didn't have glfw installed, as its bundled with raylib. I have installed my system's glfw (added the /usr/include/GLFW/ in importc) but no change. I package clang 13.0.0-4 from Manjaro installed (stable branch).

PMunch commented 2 years ago

I'm also on Manjaro and when I installed raylib it asked me if I wanted to install the glfw dependency as Wayland or X11.

planetis-m commented 2 years ago

Till now I was using statically built raylib, I will try again with shared raylib from the Manjaro repo and report back to you.

planetis-m commented 2 years ago

I get the same error, however a thing I forgot to report is I don't have nimble installed. I compiled opir and modified futhark to run ./opir in line 458. I just copied opir in my PATH and removed my modification, no difference.

my nim.cfg in futhark directory

############# begin Atlas config section ##########
--noNimblePath
--path:"../futhark/src"
--path:"../libclang-nim/src/"
--path:"../termstyle"
--path:"../macroutils/src"
############# end Atlas config section   ##########

seems to work fine...

planetis-m commented 2 years ago

I think the problem lies with libclang-nim, my system has installed libclang.so, but the bindings don't use the dynlib pragma. using passL: "-static -lclang" reveals I don't have a static libclang installed.

PMunch commented 2 years ago

Aha, that might explain it, but in that case it's a bit weird that Opir was even able to build. You should be able to just add Opir to your path so you don't have to modify Futhark itself. Shouldn't make a difference though. Which libclang version do you use? I have my own fork where I've fixed some bugs in the official version.

planetis-m commented 2 years ago

Which libclang version do you use?

The one provided by manjaro, which is probably a rebuilt of https://archlinux.org/packages/extra/x86_64/clang/

planetis-m commented 2 years ago

~these bindings https://github.com/deech/libclang_bindings seem to provide both a static and a shared version~

planetis-m commented 2 years ago

I added dynlib: "libclang.so" to every function in libclang-nim. still getting the same error...

PMunch commented 2 years ago

Here is the Opir output when I run your code on my machine: http://ix.io/3Lcf

PMunch commented 2 years ago

By the way, these are my packages:

[peter /tmp ] 16779 $ pacman -Qi clang
Name            : clang
Version         : 13.0.0-4
Description     : C language family frontend for LLVM
Architecture    : x86_64
URL             : https://clang.llvm.org/
Licenses        : custom:Apache 2.0 with LLVM Exception
Groups          : None
Provides        : clang-analyzer=13.0.0  clang-tools-extra=13.0.0
Depends On      : llvm-libs  gcc  compiler-rt
Optional Deps   : openmp: OpenMP support in clang with -fopenmp
                  python: for scan-view and git-clang-format [installed]
                  llvm: referenced by some clang headers [installed]
Required By     : lldb  openshadinglanguage  shiboken2-git
Optional For    : kakoune  qmk  qt5-tools
Conflicts With  : clang-analyzer  clang-tools-extra
Replaces        : clang-analyzer  clang-tools-extra
Installed Size  : 207.42 MiB
Packager        : Felix Yan <felixonmars@archlinux.org>
Build Date      : Thu 02 Dec 2021 16:17:59 CET
Install Date    : Tue 04 Jan 2022 09:28:17 CET
Install Reason  : Installed as a dependency for another package
Install Script  : No
Validated By    : Signature

[peter /tmp ] 16780 $ pacman -Qi raylib 
Name            : raylib
Version         : 4.0.0-1
Description     : Simple and easy-to-use game programming library
Architecture    : x86_64
URL             : https://www.raylib.com
Licenses        : ZLIB
Groups          : None
Provides        : None
Depends On      : glfw
Optional Deps   : None
Required By     : None
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 1769.90 KiB
Packager        : Alexander Rødseth <rodseth@gmail.com>
Build Date      : Sun 07 Nov 2021 21:40:15 CET
Install Date    : Tue 04 Jan 2022 22:48:19 CET
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature
planetis-m commented 2 years ago

Here is the Opir output when I run your code on my machine: http://ix.io/3Lcf

This is much better than what I was working with thus far, thank you for making futhrak and opir!

PMunch commented 2 years ago

Just too bad that Opir crashes on your machine. These kinds of errors are also super hard for me to debug since I can't reproduce the error..

arkanoid87 commented 2 years ago

Futhark was working nice with my project, then I accepted to install ubuntu upgrades on my 20.04 LTS and suddenly I face same SIGSEGV: Illegal storage access. (Attempt to read from nil?) on opir execution

The change should be outside futhark and libclang-nim

Thanks to a stable setup before the upgrade, I recompiled opir in debug mode traced down the problem to the line:

nimble install futhark@#head

project: https://github.com/arkanoid87/nimmap

cmdline: opir -I/usr/lib/clang/10/include -I/usr/include/gdal /home/jack/.cache/nim/nimmap_d/futhark-includes.h

futhark-includes.h:

#include "gdal.h"
#include "ogr_core.h"
#include "ogr_api.h"

nim output:

Traceback (most recent call last)
/home/jack/.nimble/pkgs/futhark-#head/opir.nim(343) opir
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

https://github.com/PMunch/futhark/blob/875a7f5cf727483bc4c4d394f7d3c23699002aa8/src/opir.nim#L343

(gdb) run -I/usr/lib/clang/10/include -I/usr/include/gdal /home/jack/.cache/nim/nimmap_d/futhark-includes.h

Starting program: /home/jack/.nimble/pkgs/futhark-#head/opir -I/usr/lib/clang/10/include -I/usr/include/gdal /home/jack/.cache/nim/nimmap_d/futhark-includes.h
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff119f700 (LWP 15886)]
[Thread 0x7ffff119f700 (LWP 15886) exited]

Thread 1 "opir" received signal SIGSEGV, Segmentation fault.
0x00007ffff62a81d3 in clang_getTranslationUnitCursor () from /usr/lib/x86_64-linux-gnu/libclang-10.so.1
(gdb) bt
#0  0x00007ffff62a81d3 in clang_getTranslationUnitCursor () from /usr/lib/x86_64-linux-gnu/libclang-10.so.1
#1  0x0000555555592c52 in NimMainModule ()
#2  0x000055555559197e in NimMainInner ()
#3  0x00005555555919be in NimMain ()
#4  0x0000555555591a10 in main ()

seems to happen inside clang_getTranslationUnitCursor

dump repr unit (of var cursor = getTranslationUnitCursor(unit))

repr unit = 0x7f17480036d0

stat /usr/lib/x86_64-linux-gnu/libclang-10.so.1

  File: /usr/lib/x86_64-linux-gnu/libclang-10.so.1
  Size: 34221680        Blocks: 66840      IO Block: 4096   regular file
Device: 815h/2069d      Inode: 8392614     Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-01-12 06:09:06.795501871 +0100
Modify: 2020-04-20 07:12:09.000000000 +0200
Change: 2021-12-06 03:28:40.168317608 +0100

nm -D /usr/lib/x86_64-linux-gnu/libclang-10.so.1 | grep clang_getTranslationUnitCursor

00000000003cf1c0 T clang_getTranslationUnitCursor

debugging var unit

...
import sugar
dump repr index
dump fname
for i in 0..args:
  dump commandLine[i]
dump args
var unit = parseTranslationUnit(index, fname.cstring,
                              commandLine, args.cint, nil, 0, CXTranslationUnit_DetailedPreprocessingRecord.cuint or CXTranslationUnit_SkipFunctionBodies.cuint)
dump unit.getNumDiagnostics
..
repr index = 0x5653344db180
fname = /tmp/imports-20501.h
commandLine[i] = -I/usr/lib/clang/10/include
commandLine[i] = -I/usr/include/gdal
commandLine[i] = /home/jack/.cache/nim/nimmap_d/futhark-includes.h
args = 2
unit.getNumDiagnostics = 0

cat /tmp/imports-20501.h

#include "/home/jack/.cache/nim/nimmap_d/futhark-includes.h"

not quite sure what's going on, seems that the output of clang_parseTranslationUnit is correct and goes straight into clang_getTranslationUnitCursor tha goes SIGSEGV

PMunch commented 2 years ago

Yeah, I also got that far when I was debugging this. That's why I was trying to find if we had any differences in packages or setup. Not really sure what to do about this.. It might be that the Clang wrapper is incorrect?

arkanoid87 commented 2 years ago

just tried compiling and linking opir against libclang-12 instead of libclang-10, but I got same results

I was actually trying to figure out what happened 2 hours ago when I installed ubuntu upgrades via gnome apt, but with my surprise there's no info of that operation in /var/log/apt

arkanoid87 commented 2 years ago

In my case, the same problem happens on both libclang-10, 11 and 12, so I'd stick with 10 as I've always use it for the past months on different futhark based projects.

I've installed the debug symbols, so I've a line to blame: #0 clang_getTranslationUnitCursor () at /build/llvm-toolchain-10-yegZYJ/llvm-toolchain-10-10.0.0/clang/tools/libclang/CIndex.cpp:4208

I've grabbed the original file source version

CXCursor clang_getTranslationUnitCursor(CXTranslationUnit TU) {
  if (isNotUsableTU(TU)) {
    LOG_BAD_TU(TU);
    return clang_getNullCursor();
  }

  ASTUnit *CXXUnit = cxtu::getASTUnit(TU);
  return MakeCXCursor(CXXUnit->getASTContext().getTranslationUnitDecl(), TU); // 4208
}

edit: found same ref line on github https://github.com/llvm/llvm-project/blob/ef32c611aa214dea855364efd7ba451ec5ec3f74/clang/tools/libclang/CIndex.cpp#L4208

considering the fact that gdb bt shows that SIGSEGV happens in clang_getTranslationUnitCursor, it means that isNotUsableTU(TU) returns false but CXXUnit end up invalid pointer, so the issue might be cxtu::getASTUnit

arkanoid87 commented 2 years ago

diving into the issue

edit in index.nim (cland-#head pkg) to reflect original struct

type
  # CXTranslationUnit* = pointer # CXTranslationUnitImpl
  CXTranslationUnit* = ptr object # CXTranslationUnitImpl
    cIdx*: CXIndex
    theASTUnit*: pointer
    cXStringPool*: pointer
    diagnostic*: pointer
    overridenCursorsPool*: pointer
    commentToXML*: pointer
    parsingOptions*: cuint
    arguments: pointer

checks in opir.nim

var unit = parseTranslationUnit(index, fname.cstring,
                              commandLine, args.cint, nil, 0, CXTranslationUnit_DetailedPreprocessingRecord.cuint or CXTranslationUnit_SkipFunctionBodies.cuint)
deallocCStringArray(commandLine)

assert not unit.isNil
# assert first and last struct fields
assert unit.cIdx == index
assert unit.parsingOptions == (CXTranslationUnit_DetailedPreprocessingRecord.cuint or CXTranslationUnit_SkipFunctionBodies.cuint)
#
assert not unit.theASTUnit.isNil
echo repr unit.theASTUnit

output

nim c -r futhest.nim
Hint: used config file '/home/jack/.choosenim/toolchains/nim-1.6.2/config/nim.cfg' [Conf]
Hint: used config file '/home/jack/.choosenim/toolchains/nim-1.6.2/config/config.nims' [Conf]
........................................................................................................
/home/jack/.nimble/pkgs/futhark-#head/futhark.nim(479, 12) Hint: Running: opir -I/usr/lib/clang/10/include -I/home/jack/nim/futhest/stb /home/jack/.cache/nim/futhest_d/futhark-includes.h [User]
0x7f06d0002b00
Traceback (most recent call last)
/home/jack/.nimble/pkgs/futhark-#head/opir.nim(351) opir
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
/home/jack/.nimble/pkgs/futhark-#head/futhark.nim(486, 8) Hint: Parsing Opir output [User]
stack trace: (most recent call last)
/home/jack/.nimble/pkgs/futhark-#head/futhark.nim(494, 11) importcImpl
/home/jack/nim/futhest/futhest.nim(5, 1) template/generic instantiation of `importc` from here
/home/jack/.nimble/pkgs/futhark-#head/futhark.nim(440, 14) template/generic instantiation of `importcImpl` from here
/home/jack/.nimble/pkgs/futhark-#head/futhark.nim(494, 11) Error: Unable to parse output of opir:
Segmentation fault (core dumped)

at least we know that unit.theASTUnit is not nil

arkanoid87 commented 2 years ago

created minimal example to reproduce the error:

import clang

static:
  # any content will fail, including empty file
  writeFile("futhark-includes.h", """
  #include "gdal.h"
""")

var
  commandLineParams = @[
    "-I/usr/lib/clang/10/include",
    "-I/usr/include/gdal"]
  fname = "futhark-includes.h"

  index = createIndex(0, 0)
  commandLine = allocCStringArray(commandLineParams)

var unit = parseTranslationUnit(index, fname.cstring,
                              commandLine, commandLineParams.len.cint, nil, 0, CXTranslationUnit_DetailedPreprocessingRecord.cuint or CXTranslationUnit_SkipFunctionBodies.cuint)
deallocCStringArray(commandLine)

block: # testing stuff
    assert not unit.isNil
    assert unit.getNumDiagnostics == 0
    # assert first and last struct fields
    assert unit.cIdx == index
    assert unit.parsingOptions == (CXTranslationUnit_DetailedPreprocessingRecord.cuint or CXTranslationUnit_SkipFunctionBodies.cuint)

    assert not unit.theASTUnit.isNil
    echo repr unit.theASTUnit
    echo unit.getTranslationUnitSpelling
    echo repr unit.getCXTUResourceUsage

discard getTranslationUnitCursor(unit) # SIGSEGV
assert false

output

nim c -r futhest.nim
Hint: used config file '/home/jack/.choosenim/toolchains/nim-1.6.2/config/nim.cfg' [Conf]
Hint: used config file '/home/jack/.choosenim/toolchains/nim-1.6.2/config/config.nims' [Conf]
Hint: used config file '/home/jack/nim/futhest/config.nims' [Conf]
Hint: gc: refc; opt: none (DEBUG BUILD, `-d:release` generates faster code)
19404 lines; 0.019s; 16.559MiB peakmem; proj: /home/jack/nim/futhest/futhest.nim; out: /home/jack/nim/futhest/futhest [SuccessX]
Hint: /home/jack/nim/futhest/futhest  [Exec]
0x7fbfcc002b00
(data: ..., private_flags: 1)
[data = 0x55c328a6d700,
numEntries = 12,
entries = ptr 0x55c328a58120 --> [kind = CXTUResourceUsage_AST,
amount = 4947968]]
Traceback (most recent call last)
/home/jack/nim/futhest/futhest.nim(66) futhest
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Segmentation fault (core dumped)
Error: execution of an external program failed: '/home/jack/nim/futhest/futhest '
arkanoid87 commented 2 years ago

I've created a docker based nimble package to replicate the issue and exclude single-machine case

https://github.com/arkanoid87/futhest

it includes 2 Dockerfile to replicate error on ubuntu and alpine-linux

I've also just included a working plain C example that does not crash

#include <assert.h>
#include <stdio.h>
#include "clang-c/Index.h"

enum CXChildVisitResult
visitor(CXCursor cursor, CXCursor parent, CXClientData clientData) {
    CXFile file;
    unsigned int line;
    unsigned int column;
    unsigned int offset;

    CXSourceLocation loc = clang_getCursorLocation(cursor);
    clang_getFileLocation(loc, &file, &line, &column, &offset);

    CXString filename = clang_getFileName(file);   
    printf("%s [%d:%d, %d]\n", clang_getCString(filename), line, column, offset);
    clang_disposeString(filename);

    return CXChildVisit_Continue;
}

int main (void) { 
    CXIndex Idx = clang_createIndex(0, 0);

    CXTranslationUnit TU = clang_parseTranslationUnit(Idx, 
        "src/header.h", NULL, 0, NULL, 0, 0);

    assert(TU != NULL);

    CXCursor cursor = clang_getTranslationUnitCursor(TU);

    clang_visitChildren(cursor, visitor, NULL);

    clang_disposeTranslationUnit(TU);

    printf("END");
}
arkanoid87 commented 2 years ago

Relevant forum post: https://forum.nim-lang.org/t/8796

EDIT: solution found, is a regression in nim compiler

remember to clean your ~/.cache/nim/project_* folder after swtiching

New issue: https://github.com/nim-lang/Nim/issues/19378

PMunch commented 2 years ago

Aha! My machine was still running Nim 1.6.0 which was why I didn't see the error. Hopefully this will be fixed quickly!

arkanoid87 commented 2 years ago

I'm not finding "check" suffix in saem vscode extension, but I see it calling "nim check", so it might be that.

PMunch commented 2 years ago

Might be, NimLSP (which is what I'm using for Vim) embeds the Nim compiler and calls it directly in the same way that nimsuggest does.

ringabout commented 2 years ago

related issue https://github.com/nim-lang/Nim/issues/19342

see also https://github.com/nim-lang/Nim/pull/19385

ringabout commented 2 years ago

Does nim-lang/Nim#19385 help?

arkanoid87 commented 2 years ago

I confirm bug is gone in latest devel

choosenim update devel
nimble uninstall futhark # opir rebuild required
<clear cache>
nimble build

success

thanks!

PhilippMDoerner commented 2 years ago

Given that I just had the same issue I confirm his confirmation. "Clear Cache" for me consisted pretty much of "delete /home/<youruser>/.cache" maybe as an addition

arkanoid87 commented 2 years ago

deleting ~/.cache is an overkill! You delete the cache for all user programs doing that.

the right cache dir for your nim project is ~/.cache/nim/<projectname>_<[d|r]> or wathever you set as cache

PMunch commented 2 years ago

With 1.6.4 released this seems to have been fixed upstream. If this still occurs please re-open the issue.