bonede / tree-sitter-ng

Next generation Tree Sitter Java binding.
MIT License
72 stars 10 forks source link

Core dump errors #22

Closed marusic1514 closed 3 months ago

marusic1514 commented 5 months ago

Observing core dump errors like this one:

java: D:\projects\tree-sitter-ng\tree-sitter\build\tree-sitter\tree-sitter-0.22.6\lib\src/./stack.c:96: void stack_node_release(StackNode *, StackNodeArray *, SubtreePool *): Ass
ertion `self->ref_count != 0' failed.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000079da5dfb5898, pid=1, tid=141
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.3+9 (21.0.3+9) (build 21.0.3+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (21.0.3+9-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0x28898]  abort+0x178
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /hubapp-0.11.0/bin/core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
5175utf8: 8191utf8: 6236utf8: 5859utf8: 14721utf8: 21501utf8: 16129utf8: 25185utf8: 47822utf8: 68839utf8: 15702utf8: 45280utf8: 30787utf8: 30228utf8: 16357utf8: 6519utf8: 6044utf8: 4366utf8: 107190utf8: 14890utf8: 23045utf8: 16213utf8: 29421utf8: 18782utf8: 25175utf8: 8191utf8: 6236utf8: 5859utf8: 14721utf8: 21501utf8: 16129utf8: 25185utf8: 47822utf8: 68839utf8: 15702utf8: 45280utf8: 30787utf8: 5862utf8: 135utf8: 207016utf8: 19414utf8: 157628utf8: 14586utf8: 505utf8: 3828utf8: 127679utf8: 13330utf8: 19629utf8: 51328utf8: 31251utf8: 51638utf8: 31336utf8: 41525utf8: 15644utf8: 15382utf8: 16616utf8: 40535utf8: 18178utf8: 27832utf8: 13162utf8: 27802utf8: 33839utf8: 30739utf8: 12918utf8: 1006utf8: 694utf8: 55023utf8: 15967utf8: 25351utf8: 29135utf8: 33715utf8: 39936utf8: 6870utf8: 1156utf8: 4321utf8: 22459utf8: 19036utf8: 18519utf8: 29145utf8: 61779utf8: 119295utf8: 88863utf8: 53019utf8: 11685utf8: 37658utf8: 33850utf8: 9952utf8: 23060utf8: 16083utf8: 15984utf8: 14551utf8: 3044utf8: 65751utf8: 4251utf8: 3342utf8: 14484utf8: 9638utf8: 6965utf8: 4084utf8: 3179utf8: 53369utf8: 44328utf8: 11477utf8: 22643utf8: 49744utf8: 16146utf8: 95157utf8: 83869utf8: 22757utf8: 32188utf8: 23275utf8: 22195utf8: 45611utf8: 30272utf8: 24237utf8: 15721utf8: 43858utf8: 7447utf8: 11156utf8: 51506utf8: 2524utf8: 8368utf8: 30363utf8: 22173utf8: 41499[69658.180s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

I am using the tree sitter library with multiple different parsers (Kotlin, Java, Typescript, C#, etc) and either can eventually result in the core dump error above.

Do you have a solution?


This is the code, for reference:

    fun chunkText(fileName: String, fileContents: String): List<String> {
        val input = fileContents.replace("[^\\x00-\\x7F]".toRegex(), "")
        return try {
            val parser: TSParser? = pickLanguage(fileName)?.let { language ->
                val parser = TSParser()
                parser.setLanguage(language)
                parser
            }
            if (parser == null) emptyList()
            else {
                val tree = parser.parseString(null, input)
                chunkRest(tree.rootNode, input)
            }
        } catch (expected: IncompatibleClassChangeError) {
            logger.error("Could not chunk file: $fileName exn=${expected.message}")
        }
    }

where pickLanguage picks a parser based on the file extension like so (I use a large variety of parsers and this error seems to happen for any of them):

     private fun pickLanguage(fileName: String): TSLanguage? =
        when {
            fileName.endsWith(".kt") -> TreeSitterKotlin()
            ...
            fileName.endsWith(".java") -> TreeSitterJava()
            else -> null
            }

and chunkRest does the following (essentially recursively keeps adding children until it runs of Max Chars):

   private fun chunkRest(node: TSNode, text: String, lastEnd: Int = 0): List<String> {
        val chunks = mutableListOf<String>()
        var currentChunk = ""
        val numChildren = node.childCount
        var lastEnd = lastEnd
        for (i in 0 until numChildren) {
            val child = node.getChild(i)
            if (!FORBIDDEN_GRAMMAR.contains(child.grammarType)) {
                if (child.endByte - child.startByte > MAX_CHARS) {
                    chunkRest(child, text, lastEnd).forEach { childChunk ->
                        if (currentChunk.length + childChunk.length < MAX_CHARS)
                            currentChunk += childChunk
                        else {
                            if (currentChunk.isNotEmpty()) chunks.add(currentChunk)
                            currentChunk = childChunk
                        }
                    }
                    if (currentChunk.isNotEmpty()) chunks.add(currentChunk)
                    currentChunk = ""
                } else if (currentChunk.length + child.endByte - child.startByte > MAX_CHARS) {
                    chunks.add(currentChunk)
                    currentChunk = text.substring(lastEnd, child.endByte)
                } else {
                    currentChunk += text.substring(lastEnd, child.endByte)
                }
            }
            lastEnd = child.endByte
        }

        if (currentChunk.isNotEmpty()) chunks.add(currentChunk)
        return chunks
    }
bonede commented 5 months ago

Hi there,

My recommendations are:

  1. Put the parsers and languages in global variables instead of initializing parsers and languages at every iteration. Also, use TsParser#reset to reset the parser state.

  2. Since child.firstByte and child.lastByte return offset in raw utf8 bytes, it's better work with String#getBytes("utf-8")

tpodg commented 4 weeks ago

I have the same issue. Version 0.22.5 is working fine, but when running any never version, the application randomly crashes at some point.

This is from the logs:

# JRE version: OpenJDK Runtime Environment (Red_Hat-21.0.5.0.10-1) (21.0.5+10) (build 21.0.5+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM (Red_Hat-21.0.5.0.10-1) (21.0.5+10-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [x86_64-linux-gnu-tree-sitter.so+0x2bf03]  ts_subtree_child_count+0x43