bonede / tree-sitter-ng

Next generation Tree Sitter Java binding.
MIT License
61 stars 8 forks source link

Receiving SIGILL while parsing C code #6

Closed openrefactorymunawar closed 6 months ago

openrefactorymunawar commented 6 months ago

I have been trying to use the Java binding to parse C code of a large application (https://github.com/zephyrproject-rtos/zephyr). It has 23,000+ C files. I was iteratively parsing each of the files of the application.

The application is abruptly halting with the following information.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x0000000109ed3e55, pid=87215, tid=25859
#
# JRE version: OpenJDK Runtime Environment (11.0.9+11) (build 11.0.9+11)
# Java VM: OpenJDK 64-Bit Server VM (11.0.9+11, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# C  [x86_64-macos-tree-sitter.dylib+0x6e55]  ts_language_symbol_metadata+0x85
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Dfile.encoding=UTF-8 -Dstdout.encoding=UTF-8 -Dstderr.encoding=UTF-8 com.openrefactory.internal.core.protocol.ORDaemon

Host: MacBookPro11,5 x86_64 2800 MHz, 8 cores, 16G, Darwin 20.6.0
Time: Wed Mar  6 00:45:28 2024 PST elapsed time: 55.288547 seconds (0d 0h 0m 55s)

---------------  T H R E A D  ---------------

Current thread (0x00007fd44e072000):  JavaThread "pool-1-thread-1" [_thread_in_native, id=25859, stack(0x00007000088cc000,0x00007000089cc000)]

Stack: [0x00007000088cc000,0x00007000089cc000],  sp=0x00007000089c9f00,  free space=1015k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [x86_64-macos-tree-sitter.dylib+0x6e55]  ts_language_symbol_metadata+0x85
C  [x86_64-macos-tree-sitter.dylib+0x59d51]  ts_subtree_new_node+0x41
C  [x86_64-macos-tree-sitter.dylib+0x798e2]  ts_parser__reduce+0x5f2
C  [x86_64-macos-tree-sitter.dylib+0x203fd]  ts_parser__advance+0x13ed
C  [x86_64-macos-tree-sitter.dylib+0x1cbc9]  ts_parser_parse+0x1e49
C  [x86_64-macos-tree-sitter.dylib+0xa1915]  Java_org_treesitter_TSParser_ts_1parser_1parse+0x245
J 1401  org.treesitter.TSParser.ts_parser_parse(J[BJLorg/treesitter/TSReader;I)J (0 bytes) @ 0x000000011ea53d3d [0x000000011ea53c40+0x00000000000000fd]
J 1400 c1 org.treesitter.TSParser.parse([BLorg/treesitter/TSTree;Lorg/treesitter/TSReader;Lorg/treesitter/TSInputEncoding;)Lorg/treesitter/TSTree; (51 bytes) @ 0x00000001176dd4fc [0x00000001176dd3e0+0x000000000000011c]
J 1364 c1 com.openrefactory.internal.core.transformation.c.vpg.CVPG.parseWithTreeSitter(Ljava/lang/String;Lorg/openrefactory/core/model/IModelFileElement;)Lorg/treesitter/TSTree; (82 bytes) @ 0x00000001176cebf4 [0x00000001176ce080+0x0000000000000b74]
J 1363 c1 com.openrefactory.internal.core.transformation.c.vpg.CVPG.parse(Ljava/lang/String;)Lorg/treesitter/TSTree; (95 bytes) @ 0x00000001176cd06c [0x00000001176cce20+0x000000000000024c]
J 1351 c1 com.openrefactory.internal.core.transformation.c.vpg.CVPG.parse(Ljava/lang/String;)Ljava/lang/Object; (6 bytes) @ 0x00000001176c9434 [0x00000001176c93c0+0x0000000000000074]
J 1349 c1 com.eclipse.rephraserengine.core.vpg.ASTRepository.acquireTransientAST(Ljava/lang/String;ZLcom/eclipse/rephraserengine/core/vpg/VPG;)Ljava/lang/Object; (171 bytes) @ 0x00000001176c8354 [0x00000001176c7600+0x0000000000000d54]
J 1309 c1 com.eclipse.rephraserengine.core.vpg.VPG.forceRecomputationOfEdgesAndAnnotations(Ljava/lang/String;)V (22 bytes) @ 0x00000001176b727c [0x00000001176b6c00+0x000000000000067c]
j  com.openrefactory.internal.core.transformation.c.vpg.CVPG.ensureVPGIsUpToDate(Lorg/openrefactory/core/transformation/IProgressReporter;)V+201
j  com.openrefactory.internal.protocol.commands.RefactoringPassCommand$RefactoringPass.runRefactoringPass()V+7
j  com.openrefactory.internal.protocol.commands.RefactoringPassCommand$RefactoringPass.run()V+5
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@11.0.9
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.9
j  java.lang.Thread.run()V+11 java.base@11.0.9
v  ~StubRoutines::call_stub
V  [libjvm.dylib+0x3a9a86]  _ZN9JavaCalls11call_helperEP9JavaValueRK12methodHandleP17JavaCallArgumentsP6Thread+0x21a
V  [libjvm.dylib+0x3a8eec]  _ZN9JavaCalls12call_virtualEP9JavaValueP5KlassP6SymbolS5_P17JavaCallArgumentsP6Thread+0xee
V  [libjvm.dylib+0x3a8fa8]  _ZN9JavaCalls12call_virtualEP9JavaValue6HandleP5KlassP6SymbolS6_P6Thread+0x62
V  [libjvm.dylib+0x42d6c8]  _ZL12thread_entryP10JavaThreadP6Thread+0x78
V  [libjvm.dylib+0x70b42a]  _ZN10JavaThread17thread_main_innerEv+0x82
V  [libjvm.dylib+0x70b274]  _ZN10JavaThread3runEv+0x174
V  [libjvm.dylib+0x709150]  _ZN6Thread8call_runEv+0x68
V  [libjvm.dylib+0x60f213]  _ZL19thread_native_entryP6Thread+0x139
C  [libsystem_pthread.dylib+0x68fc]  _pthread_start+0xe0
C  [libsystem_pthread.dylib+0x2443]  thread_start+0xf

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 1401  org.treesitter.TSParser.ts_parser_parse(J[BJLorg/treesitter/TSReader;I)J (0 bytes) @ 0x000000011ea53cc0 [0x000000011ea53c40+0x0000000000000080]
J 1400 c1 org.treesitter.TSParser.parse([BLorg/treesitter/TSTree;Lorg/treesitter/TSReader;Lorg/treesitter/TSInputEncoding;)Lorg/treesitter/TSTree; (51 bytes) @ 0x00000001176dd4fc [0x00000001176dd3e0+0x000000000000011c]
J 1364 c1 com.openrefactory.internal.core.transformation.c.vpg.CVPG.parseWithTreeSitter(Ljava/lang/String;Lorg/openrefactory/core/model/IModelFileElement;)Lorg/treesitter/TSTree; (82 bytes) @ 0x00000001176cebf4 [0x00000001176ce080+0x0000000000000b74]
J 1363 c1 com.openrefactory.internal.core.transformation.c.vpg.CVPG.parse(Ljava/lang/String;)Lorg/treesitter/TSTree; (95 bytes) @ 0x00000001176cd06c [0x00000001176cce20+0x000000000000024c]
J 1351 c1 com.openrefactory.internal.core.transformation.c.vpg.CVPG.parse(Ljava/lang/String;)Ljava/lang/Object; (6 bytes) @ 0x00000001176c9434 [0x00000001176c93c0+0x0000000000000074]
J 1349 c1 com.eclipse.rephraserengine.core.vpg.ASTRepository.acquireTransientAST(Ljava/lang/String;ZLcom/eclipse/rephraserengine/core/vpg/VPG;)Ljava/lang/Object; (171 bytes) @ 0x00000001176c8354 [0x00000001176c7600+0x0000000000000d54]
J 1309 c1 com.eclipse.rephraserengine.core.vpg.VPG.forceRecomputationOfEdgesAndAnnotations(Ljava/lang/String;)V (22 bytes) @ 0x00000001176b727c [0x00000001176b6c00+0x000000000000067c]
j  com.openrefactory.internal.core.transformation.c.vpg.CVPG.ensureVPGIsUpToDate(Lorg/openrefactory/core/transformation/IProgressReporter;)V+201
j  com.openrefactory.internal.protocol.commands.RefactoringPassCommand$RefactoringPass.runRefactoringPass()V+7
j  com.openrefactory.internal.protocol.commands.RefactoringPassCommand$RefactoringPass.run()V+5
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@11.0.9
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.9
j  java.lang.Thread.run()V+11 java.base@11.0.9
v  ~StubRoutines::call_stub

<snipped>

I had two different runs and it stops at different points of execution.

openrefactorymunawar commented 6 months ago

I used the following code:

private TSTree parseWithTreeSitter(String fileName, IModelFileElement file) throws IOException {
        TSParser parser = new TSParser();
        TSLanguage c = new TreeSitterC();
        parser.setLanguage(c);
        String code = new String(Files.readAllBytes(Paths.get(fileName)));
        byte[] buffer = new byte[code.length() + 50];
        TSReader reader = (buf, offset, position) -> {
            if(offset >= code.length()){
                return 0;
            }
            ByteBuffer charBuffer = ByteBuffer.wrap(buf);
            charBuffer.put(code.getBytes());
            return code.length();
        };
        TSTree tree = parser.parse(buffer, null, reader, TSInputEncoding.TSInputEncodingUTF8);
        return tree;
    }

And I was calling this method iteratively on all the .c files in the project.

bonede commented 6 months ago

Hi, please use byte[] instead of wrapping it in String, and make sure the return value of TSReader#read won't overflow the buffer.

If you prefer to read all file content to memory, please consinder Parser#parseStringEncoding.

Hope this helps.