Tickaroo / tikxml

Modern XML Parser for Android
Apache License 2.0
423 stars 44 forks source link

java.lang.ArrayIndexOutOfBoundsException size=1 offset=1 byteCount=1 #128

Closed maxlord closed 4 years ago

maxlord commented 5 years ago

On parsing content i receive

java.lang.ArrayIndexOutOfBoundsException
size=1 offset=1 byteCount=1
at okio.Util.checkOffsetAndCount(Util.java:30) 
at okio.Buffer.getByte(Buffer.java:302)
...
sockeqwe commented 5 years ago

Hello, could you please provide more information? What is the xml that you try to parse? Ca

Maxim Kuleshov notifications@github.com schrieb am Mi., 24. Apr. 2019, 15:32:

On parsing content i receive

java.lang.ArrayIndexOutOfBoundsException size=1 offset=1 byteCount=1 at okio.Util.checkOffsetAndCount(Util.java:30) at okio.Buffer.getByte(Buffer.java:302) ...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Tickaroo/tikxml/issues/128, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEOPLR4LV2RFZ333LAHFFDPSBHN5ANCNFSM4HIDNOLQ .

maxlord commented 5 years ago

This is not a permanent problem. It seems that when there is too much data, then a problem arises.

Parsing data starts on Schedulers.io().

Example xml: https://pastebin.com/JtVzq8rv

Mapping: https://pastebin.com/Gsfrg0eV

Error occurs in parse "categoryId" element in last item.

Currently no error occurs. It error occurs in random time.

maxlord commented 5 years ago

Is there any news on my problem?

sockeqwe commented 5 years ago

I'm sorry, I didn't have time to look into it right now.

Which version of TikXml are you using? Which version of okio are you using (i.e. does retrofit / okhttp set a newer version of okio than tikxml)?

maxlord commented 5 years ago

tikxml - 0.8.15 okio - 2.2.2 okhttp 3.11.0

I tried to connect both the library and the source code

nicolai-shape commented 5 years ago

We're also running into this issue, it seems to happen for larger XML strings

sockeqwe commented 5 years ago

Any chance you could share the xml to reproduce it?

Are you sure the xml string is not empty?

Nicolai Hargreave notifications@github.com schrieb am Di., 9. Juli 2019, 15:24:

We're also running into this issue, it seems to happen for larger XML strings

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Tickaroo/tikxml/issues/128?email_source=notifications&email_token=AAEOPLXVV6HPCRJQL5R5WH3P6SGQJA5CNFSM4HIDNOL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZQH7BQ#issuecomment-509640582, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEOPLX6UOKF4AYQ7ZYNLQ3P6SGQJANCNFSM4HIDNOLQ .

nicolai-shape commented 5 years ago

@sockeqwe here https://pastebin.com/hQq1DjBi

We're using these dependencies:

    implementation("com.tickaroo.tikxml:core:0.8.16-SNAPSHOT")
    implementation("com.tickaroo.tikxml:annotation:0.8.16-SNAPSHOT")
    implementation("com.tickaroo.tikxml:retrofit-converter:0.8.16-SNAPSHOT")

    kapt("com.tickaroo.tikxml:processor:0.8.16-SNAPSHOT")
    kapt("com.tickaroo.tikxml:processor-common:0.8.16-SNAPSHOT")
    kapt("com.github.Tickaroo.tikxml:processor:0.8.13")
    kapt("com.github.Tickaroo.tikxml:processor-common:0.8.13")
mandrachek commented 5 years ago

I just got my model straight, and ran into this immediately. I'm using:

    implementation "com.squareup.okhttp3:okhttp:4.0.1"
    implementation "com.squareup.okhttp3:logging-interceptor:4.0.1"

    implementation 'com.github.Tickaroo.tikxml:annotation:0.8.15'
    implementation 'com.github.Tickaroo.tikxml:core:0.8.15'
    kapt 'com.github.Tickaroo.tikxml:processor-common:0.8.15'
    kapt 'com.github.Tickaroo.tikxml:processor:0.8.15'

    implementation 'com.squareup.retrofit2:retrofit:2.6.0'
    implementation "com.github.Tickaroo.tikxml:retrofit-converter:0.8.15"
    implementation 'com.squareup.retrofit2:adapter-rxjava2:2.6.0'

I will see if I can come up with a test case - if parsing the same xml works from the file system, I would imagine the issue would be in the retrofit converter.

mandrachek commented 5 years ago

I was able to download my XML file, and I get the same error when reading it from disk as I do from the web, so it's not a retrofit/retrofit-converter issue.

I found that by removing any last node from my document, it was able to read it just fine.

crashing document: 3,529 lines, 115950 characters non-crashing document: 3,487 lines, 114643 characters

Upon further examination, any single node can be removed, which allows the document to be parsed. Most of my nodes are ~1k in size, doesn't seem to matter which one I remove. I've tried adding nodes as well, and it crashes all the time.

Stack trace follows:

size=1 offset=1 byteCount=1
java.lang.ArrayIndexOutOfBoundsException: size=1 offset=1 byteCount=1
    at okio.-Util.checkOffsetAndCount(-Util.kt:23)
    at okio.Buffer.getByte(Buffer.kt:286)
    at com.tickaroo.tikxml.XmlReader.nextNonWhitespace(XmlReader.java:806)
    at com.tickaroo.tikxml.XmlReader.nextNonWhitespace(XmlReader.java:773)
    at com.tickaroo.tikxml.XmlReader.doPeek(XmlReader.java:187)
    at com.tickaroo.tikxml.XmlReader.hasAttribute(XmlReader.java:386)
    at Entry$$TypeAdapter.fromXml(Entry$$TypeAdapter.java:372)
    at Entry$$TypeAdapter.fromXml(Entry$$TypeAdapter.java:12)
    at XmlNode$$TypeAdapter$1.fromXml(XmlNode$$TypeAdapter.java:19)
    at XmlNode$$TypeAdapter$1.fromXml(XmlNode$$TypeAdapter.java:16)
    at XmlNode$$TypeAdapter.fromXml(XmlNode$$TypeAdapter.java:40)
    at XmlNode$$TypeAdapter.fromXml(XmlNode$$TypeAdapter.java:12)
    at ArrayOfNodes$$TypeAdapter$1.fromXml(ArrayOfNodes$$TypeAdapter.java:25)
    at ArrayOfNodes$$TypeAdapter$1.fromXml(ArrayOfNodes$$TypeAdapter.java:18)
    at ArrayOfNodes$$TypeAdapter.fromXml(ArrayOfNodes$$TypeAdapter.java:46)
    at ArrayOfXmlNode$$TypeAdapter.fromXml(ArrayOfXmlNode$$TypeAdapter.java:14)
    at com.tickaroo.tikxml.TikXml.read(TikXml.java:113)

Code:

class XmlTest {
    @Test
    fun testXml() {
        val tikXml = TikXml.Builder().build()

        val data = tikXml.read(
            javaClass.getResourceAsStream("data.xml").source().buffer(),
            ArrayOfNodes::class.java
        ) as ArrayOfNodes
    }
}

Unfortunately I can't share the model and XML publicly. :(

sockeqwe commented 5 years ago

Im not sure if that's an Okio issue or Tikxml issue. Could you please try to use the latest Okio version and read the file again?

https://github.com/square/okio/blob/master/README.md

compile 'com.squareup.okio:okio:2.2.2'

Mark Andrachek notifications@github.com schrieb am Mi., 17. Juli 2019, 22:03:

I was able to download my XML file, and I get the same error when reading it from disk as I do from the web, so it's not a retrofit/retrofit-converter issue.

I found that by removing any last node from my document, it was able to read it just fine.

crashing document: 3,529 lines, 115950 characters non-crashing document: 3,487 lines, 114643 characters

Upon further examination, any single node can be removed, which allows the document to be parsed. Most of my nodes are ~1k in size, doesn't seem to matter which one I remove. I've tried adding nodes as well, and it crashes all the time.

Stack trace follows:

size=1 offset=1 byteCount=1 java.lang.ArrayIndexOutOfBoundsException: size=1 offset=1 byteCount=1 at okio.-Util.checkOffsetAndCount(-Util.kt:23) at okio.Buffer.getByte(Buffer.kt:286) at com.tickaroo.tikxml.XmlReader.nextNonWhitespace(XmlReader.java:806) at com.tickaroo.tikxml.XmlReader.nextNonWhitespace(XmlReader.java:773) at com.tickaroo.tikxml.XmlReader.doPeek(XmlReader.java:187) at com.tickaroo.tikxml.XmlReader.hasAttribute(XmlReader.java:386) at Entry$$TypeAdapter.fromXml(Entry$$TypeAdapter.java:372) at Entry$$TypeAdapter.fromXml(Entry$$TypeAdapter.java:12) at XmlNode$$TypeAdapter$1.fromXml(XmlNode$$TypeAdapter.java:19) at XmlNode$$TypeAdapter$1.fromXml(XmlNode$$TypeAdapter.java:16) at XmlNode$$TypeAdapter.fromXml(XmlNode$$TypeAdapter.java:40) at XmlNode$$TypeAdapter.fromXml(XmlNode$$TypeAdapter.java:12) at ArrayOfNodes$$TypeAdapter$1.fromXml(ArrayOfNodes$$TypeAdapter.java:25) at ArrayOfNodes$$TypeAdapter$1.fromXml(ArrayOfNodes$$TypeAdapter.java:18) at ArrayOfNodes$$TypeAdapter.fromXml(ArrayOfNodes$$TypeAdapter.java:46) at ArrayOfXmlNode$$TypeAdapter.fromXml(ArrayOfXmlNode$$TypeAdapter.java:14) at com.tickaroo.tikxml.TikXml.read(TikXml.java:113)

Code:

class XmlTest { @Test fun testXml() { val tikXml = TikXml.Builder().build()

    val data = tikXml.read(
        javaClass.getResourceAsStream("data.xml").source().buffer(),
        ArrayOfNodes::class.java
    ) as ArrayOfNodes
}

}

Unfortunately I can't share the model and XML publicly. :(

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Tickaroo/tikxml/issues/128?email_source=notifications&email_token=AAEOPLQN5JIPOP2JRGXILFLP753JLA5CNFSM4HIDNOL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2GNUOI#issuecomment-512547385, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEOPLTZ23IZC4APG2EHZPDP753JLANCNFSM4HIDNOLQ .

mandrachek commented 5 years ago

tried that, same result.

Im not sure if that's an Okio issue or Tikxml issue. Could you please try to use the latest Okio version and read the file again? https://github.com/square/okio/blob/master/README.md compile 'com.squareup.okio:okio:2.2.2' Mark Andrachek notifications@github.com schrieb am Mi., 17. Juli 2019, 22:03:

mandrachek commented 5 years ago

I was able to find an XML file that reproduces the issue, Hamlet

The model I used (Kotlin) is:

import com.tickaroo.tikxml.annotation.*

@Xml(name="PLAY")
data class Play (
    @field:Element var title: Title? = null,
    @field:Element var fm: FM? = null,
    @field:Element var personae: Personae? = null,
    @field:PropertyElement(name="PLAYSUBT") var subTitle: String? = null,
    @field:Element var characters: List<Persona>? = null,
    @field:Element var characterGroup: List<PersonaGroup>? = null,
    @field:Element var sceneDescripton: List<SceneDescripton>? = null,
    @field:Element var acts: List<Act>? = null

)

@Xml(name="TITLE")
class Title (
    @field:Attribute(name="AUTHOR") var author: String? = null,
    @field:TextContent var title: String? = null
)

@Xml(name="FM")
data class FM (@field:Element var items: List<P>? = null)

@Xml(name="P")
data class P (@field:TextContent var text: String? = null)

@Xml(name="PERSONAE")
data class Personae(
    @field:Element var title: Title? = null,
    @field:Element var persona: List<Persona>? = null,
    @field:Element var groups: List<PersonaGroup>? = null
)

@Xml(name="PERSONA")
data class Persona(@field:TextContent var text: String? = null)

@Xml(name="PGROUP")
data class PersonaGroup(
    @field:Element var characters: List<Persona>? = null,
    @field:Element var description: GroupDescription? = null
)

@Xml(name="GRPDESCR")
data class GroupDescription(@field:TextContent var text: String? = null)

@Xml(name="SCNDESCR")
data class SceneDescripton(@field:TextContent var text: String? = null)

@Xml(name="ACT")
data class Act(
    @field:PropertyElement(name="TITLE") var title: String? = null,
    @field:Element var scenes: Scene? = null
)

@Xml(name="SCENE")
data class Scene(
    @field:PropertyElement(name="TITLE") var title: String? = null,
    @field:Element var stageDirections: List<StageDirection>? = null,
    @field:Element var speeches: List<Speech>? = null
)

@Xml(name="STAGEDIR")
data class StageDirection(@field:TextContent var text: String? = null)

@Xml(name="SPEECH")
data class Speech(
    @field:PropertyElement(name="SPEAKER") var speaker: String? = null,
    @field:Element var lines: List<Line>? = null,
    @field:Element var stageDirections: List<StageDirection>? = null
)

@Xml(name="LINE")
data class Line(
    @field:TextContent var text: String? = null,
    @field:Element var stageDirections: List<StageDirection>? = null
)

updated with working model

WeaponMan commented 5 years ago

@mandrachek If you really wanna use this library without fixing it yourself just use older version that works with your hamlet example.

def tikxmlVersion = '0.8.13'
implementation "com.tickaroo.tikxml:core:$tikxmlVersion"
implementation "com.tickaroo.tikxml:annotation:$tikxmlVersion"
kapt "com.tickaroo.tikxml:processor:$tikxmlVersion"
Globegitter commented 5 years ago

Yep downgrading to 0.8.13 also fixes the issue for me, while still forcing the use of okio 2.2.2 @sockeqwe so that would suggest to me that the issue got introduced here somewhere along the line.

Globegitter commented 5 years ago

But now running into https://github.com/Tickaroo/tikxml/issues/80 as well as that the order is now not being preserved (which was fixed in 0.8.15), so stuck in a loop now.

sockeqwe commented 5 years ago

Will work on it in next weekend. I apologise for any inconveniences

Markus Padourek notifications@github.com schrieb am Mo., 5. Aug. 2019, 16:41:

But now running into #80 https://github.com/Tickaroo/tikxml/issues/80 as well as that the order is now not being preserved (which was fixed in 0.8.15), so stuck in a loop now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Tickaroo/tikxml/issues/128?email_source=notifications&email_token=AAEOPLSHROCDIMCXUSQB5UDQDA3ZXA5CNFSM4HIDNOL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3SBATI#issuecomment-518262861, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEOPLXQQAK36MGWVEIIN4TQDA3ZXANCNFSM4HIDNOLQ .

rayhiker commented 5 years ago

I am seeing the same exception. I found out how it happens -- for me it's caused by a file larger than one Okio buffer (8192 bytes) and the last byte of the first buffer happens to be "<". When I do a call to XmlReader#hasElement at this point, it calls doPeek, which calls nextNonWhitespace.

The issue is XmlReader.java line 806: buffer.getByte() is called without a matching fillBuffer() call. This leads to checkOffsetAndCount getting called with size=1, offset=1, and byteCount=1, which causes the exception.

I see a related problem: the call to isCDATA() in line 804 will always fail since the next buffer is not loaded yet.

rayhiker commented 5 years ago

I found something that also contributed to the problem: there was a CR-LF combination (ASCII 13-10) that somehow replaced the original LF after the first line of the XML file. But I think the underlying problem is what I wrote above.

reline commented 5 years ago

@sockeqwe I've attempted a fix for this at #133, not entirely familiar with the library quite yet so let me know if filling the buffer once is not enough. It passes my test case. I added a GPX file that failed beforehand since it was difficult for me to produce a failing xml file by hand. It also does not cover the case for checking CDATA properly.

chubecode commented 5 years ago

Any update ?? I got same issue.

DinuNcl commented 4 years ago

I got the same issue. tried v0.8.13 and got rid of the problem, but ran into another problem: java.io.IOException: Unterminated comment at path /

sockeqwe commented 4 years ago

@DinuNcl its fixed on latest snapshot builds 0.9.0_9-SNAPSHOT (see README)

newmanw commented 1 year ago

Is 0.9.x non snapshot planned for release at some point?