BenWoodworth / knbt

Kotlin NBT library for kotlinx.serialization
GNU Lesser General Public License v3.0
72 stars 2 forks source link

Extra bytes are read when decoding Nbt from stream #37

Open WinX64 opened 10 months ago

WinX64 commented 10 months ago

When decoding Nbt from a stream, the decoder will read past the point where the actual Nbt data is. This is generally not an issue if you're solely reading the Nbt data, or if it's the last in a sequence of reads, but becomes a problem when you need to perform subsequent reads after decoding the Nbt.

Here's an example to reproduce the problem:

val nbt = Nbt {
    variant = NbtVariant.JavaNetwork(764)
    compression = NbtCompression.None
}

//A stream of data simulating the structure of a packet in the protocol
//NbtCompound ({text:"test"}), followed by an int (3) and a byte (5)
val array = byteArrayOf(10, 8, 0, 4, 116, 101, 120, 116, 0, 4, 116, 101, 115, 116, 0, 0, 0, 0, 3, 5)
val input = ByteArrayInputStream(array)

val tag: NbtTag = nbt.decodeFromStream(input)

println(tag)
println(input.available())

The expected output would be the following, since only the tag was read:

{text:"test"}
5

But I'm getting:

{text:"test"}
0
BenWoodworth commented 10 months ago

Good catch! I do have a test for this:

https://github.com/BenWoodworth/knbt/blob/d288924186a991ccfa6c0b61556d3a49a9f92fd0/src/commonTest/kotlin/internal/BinaryNbtReaderTest.kt#L33-L39

But it looks like my test Source check for whether it has read past the end is flawed: https://github.com/BenWoodworth/knbt/blob/d288924186a991ccfa6c0b61556d3a49a9f92fd0/src/commonTest/kotlin/TestIo.kt#L19-L22

(The it == -1L check. it is the number of bytes actually read, or -1 if it's already exhausted.)

But this is only being called once, and with 8192 bytes requested... so the source gets immediately read more than is needed but doesn't trip my check

As for why so much is getting read, knbt will peek at the first byte before decoding to check that the compression matches (and fail with a nice message if it doesn't). But it looks like okio sees the 1 byte request and actually pulls in 8192: https://github.com/square/okio/blob/parent-3.3.0/okio/src/commonMain/kotlin/okio/internal/-RealBufferedSource.kt#L56-L63

image

BenWoodworth commented 10 months ago

So I'll update my test so it actually catches this, and see what I can do to fix it.

Everything gets wrapped into okio since that's what knbt leans on under the hood, and that's why this is also happening with Java streams since they get wrapped by okio. Also not sure if this is a bug with okio, but it should be something I can fix on my end even if it is

BenWoodworth commented 10 months ago

Okay, so as far as using the okio APIs goes, it's okay that deserializing from a passed in okioBufferedSource reads in the 8192 bytes from an underlying okioSource. Since the library user holds the reference to it still, and can still pull the extra bytes (3, 5 in this example) after deserializing.

So the problem here, is when we decode from a javaInputStream, it gets wrapped in an okioBufferedSource that pulls in the extra bytes, but once deserializing is done, the lib user doesn't have access to that okioBufferedSource or the bytes that it pulled in.

I think the proper solution here is to not lean on okio as an abstraction, and instead introduce this library's own InternalKnbtReader abstraction that behaves how I want. Similar to the way kotlinx-serialization-json extracted their IO logic last year:

https://github.com/Kotlin/kotlinx.serialization/blob/cf71e0881b284ce8b2d3cf67218869fec4e37d82/formats/json/jvmMain/src/kotlinx/serialization/json/internal/JvmJsonStreams.kt#L256-L267

Or perhaps leaning on the non-buffered okio Sink/Source internally to avoid too much re-work? Though I'm going to end up doing this refactor soon enough anyway since an initial kotlinx-io release is on the roadmap now

And for a short-term quick fix, I should be able to wrap the Java InputStream into an Okio Source that only reads one byte at a time, which should prevent the buffer that it is wrapped in from pulling in the 8192 bytes.

BenWoodworth commented 10 months ago

Just pushed the quick fix to the JavaNetwork branch if you want to check that out and let me know :)