MohamedRejeb / Ksoup

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.
Apache License 2.0
368 stars 10 forks source link

KsoupHtmlParser throws IndexOutOfBounds #19

Closed vanniktech closed 1 year ago

vanniktech commented 1 year ago

The following test case fails:

@Test fun indexOutOfBounds() {
    handler = Builder().build()
java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 0 at java.base/jdk.internal.util.Preconditions.outOfBounds( at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex( at java.base/jdk.internal.util.Preconditions.checkIndex( at java.base/java.util.Objects.checkIndex( at java.base/java.util.ArrayList.get( at com.mohamedrejeb.ksoup.html.parser.KsoupHtmlParser.closeCurrentTag(KsoupHtmlParser.kt:187) at com.mohamedrejeb.ksoup.html.parser.KsoupHtmlParser.onSelfClosingTag(KsoupHtmlParser.kt:172) at com.mohamedrejeb.ksoup.html.tokenizer.KsoupTokenizer.stateInSelfClosingTag(KsoupTokenizer.kt:354) at com.mohamedrejeb.ksoup.html.tokenizer.KsoupTokenizer.parse(KsoupTokenizer.kt:613) at com.mohamedrejeb.ksoup.html.tokenizer.KsoupTokenizer.write(KsoupTokenizer.kt:58) at com.mohamedrejeb.ksoup.html.parser.KsoupHtmlParser.write(KsoupHtmlParser.kt:378) at com.mohamedrejeb.ksoup.html.parser.KsoupHtmlParser.end(KsoupHtmlParser.kt:394) at com.mohamedrejeb.ksoup.html.parser.KsoupHtmlParser.parseComplete(KsoupHtmlParser.kt:335)

Note that this is a real case scenario when trying to parse the content description of a Feed in order to try to look for images:

MohamedRejeb commented 1 year ago

Thanks for reporting these issues. I was quite busy recently. A new release is going to be published in the next few hours containing the fixes.

vanniktech commented 1 year ago

Amazing work @MohamedRejeb. This is amazing. From my first testing it seems like I can ditch both jsoup on Android & HTMLString on iOS.