Kotlin / dokka

API documentation engine for Kotlin
https://kotl.in/dokka
Apache License 2.0
3.4k stars 399 forks source link

Concurrency failure in jsoup? #3321

Open owengray-google opened 10 months ago

owengray-google commented 10 months ago

Describe the bug We have recently been seeing some timeouts when running dackka. We managed to capture JVM threads during a timeout kill:

I think the dackka process was probably

4556 org.jetbrains.dokka.MainKt /buildbot/dist_dirs/aosp-androidx-main-linux-androidx/11066813/dackkaArgs-docs-public.json -loggingLevel WARN -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant The full stacktrace for that process is pretty long but I notice that most threads have a stacktrace that looks like this:

"DefaultDispatcher-worker-1" #28 daemon prio=5 os_prio=0 cpu=243.46ms elapsed=3478.27s tid=0x00007f99a09cc450 nid=0x122d in Object.wait()  [0x00007f986edeb000]
   java.lang.Thread.State: RUNNABLE
    at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt:25)
    - waiting on the Class initialization monitor for org.jsoup.nodes.Entities
    at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt)
    at org.intellij.markdown.lexer.Compat.forEachCodePoint(Compat.kt:14)
    at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt.parseHtmlEncodedWithNormalisedSpaces(parseWithNormalisedSpaces.kt:19)
    at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt.parseWithNormalisedSpaces(parseWithNormalisedSpaces.kt:49)
    at org.jetbrains.dokka.base.parsers.factories.DocTagsFromIElementFactory.getInstance(DocTagsFromIElementFactory.kt:46)
    at org.jetbrains.dokka.base.parsers.factories.DocTagsFromIElementFactory.getInstance$default(DocTagsFromIElementFactory.kt:16)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.textHandler(MarkdownParser.kt:243)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:391)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren(MarkdownParser.kt:408)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren$default(MarkdownParser.kt:407)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.defaultHandler(MarkdownParser.kt:355)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:398)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren(MarkdownParser.kt:408)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren$default(MarkdownParser.kt:407)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.markdownFileHandler(MarkdownParser.kt:201)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:392)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode$default(MarkdownParser.kt:358)
    at org.jetbrains.dokka.base.parsers.MarkdownParser.parseStringToDocNode(MarkdownParser.kt:40)
    at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion$parseFromKDocTag$1.invoke(MarkdownParser.kt:514)
    at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion.parseFromKDocTag(MarkdownParser.kt:526)
    at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion.parseFromKDocTag$default(MarkdownParser.kt:508)
    at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.getDocumentation(DefaultDescriptorToDocumentableTranslator.kt:1029)
    at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.resolveDescriptorData(DefaultDescriptorToDocumentableTranslator.kt:904)
    at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.access$resolveDescriptorData(DefaultDescriptorToDocumentableTranslator.kt:135)
    at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor$resolveClassDescriptionData$2.invokeSuspend(DefaultDescriptorToDocumentableTranslator.kt:939)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:33)
    at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
    at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:33)
    at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

and I notice one thread whose stack looks like this:

"DefaultDispatcher-worker-15" #42 daemon prio=5 os_prio=0 cpu=697.27ms elapsed=3478.23s tid=0x00007f9824023400 nid=0x123b in Object.wait()  [0x00007f986d6f0000]
   java.lang.Thread.State: RUNNABLE
    at org.jsoup.nodes.Document$OutputSettings.<init>(Document.java:416)
    - waiting on the Class initialization monitor for org.jsoup.nodes.Entities$EscapeMode
    at org.jsoup.nodes.Document.<init>(Document.java:26)
    at org.jsoup.nodes.Document.createShell(Document.java:52)
    at org.jsoup.parser.Parser.parseBodyFragment(Parser.java:218)
    at org.jsoup.Jsoup.parseBodyFragment(Jsoup.java:241)
    at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser$Parse.invoke(JavadocParser.kt:465)
    at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.convertJavadocElements(JavadocParser.kt:474)
    at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.convertJavadocElements$default(JavadocParser.kt:471)
    at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.getDescription(JavadocParser.kt:207)
    at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.parseDocComment$base(JavadocParser.kt:59)
    at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.parseDocumentation(JavadocParser.kt:52)
    at org.jetbrains.dokka.base.translators.psi.DefaultPsiToDocumentableTranslator$DokkaPsiParser$parseClasslike$2.invokeSuspend(DefaultPsiToDocumentableTranslator.kt:243)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

I believe the underlying issue we're running into is in org.jsoup.nodes.Entities, where:
private static final HashMap<String, String> multipoints = new HashMap<>(); // name -> multiple character references
is static but is not threadsafe, and that this issue would be fixed by using ConcurrentHashMap there.

I have filed this on jsoup as https://github.com/jhy/jsoup/issues/2042. If that is indeed the cause, then this bug should just be to update the jsoup version once it is fixed upstream.

vmishenev commented 9 months ago

It if is indeed a concurrency issue, #3151 can fix it.