avro-kotlin / avro4k

Avro format support for Kotlin
Apache License 2.0
198 stars 37 forks source link

encodeToByteArray() non deterministic? #83

Closed geomagilles closed 4 years ago

geomagilles commented 4 years ago

While ser/de works, the ByteArray returned by encodeToByteArray seems no deterministic.

For example:

@Serializable
data class TaskOptions(
    val runningTimeout: Float? = null
)

fun main() {
    val m = TaskOptions(2F)
    val b1 = Avro.default.encodeToByteArray(TaskOptions.serializer(), m)
    val b2 = Avro.default.encodeToByteArray(TaskOptions.serializer(), m)
    println(b1.contentEquals(b2))
}

will print "false".

Besides being strange, it's an issue for some unit tests

thake commented 4 years ago

Hey @geomagilles,

the default encoding used by encodeToByteArray is AvroEncodeFormat.Data() which encodes the schema and the written data into the byte array as a so-called Object Container Files. The structure of a Object Container File contains a random sync marker that can be used to easily split byte streams at these markers in oder to be able to have fast parallel processing. Thus the output from encodeToByteArray will contain random data that differs from call to call.

One could make the random sync marker a property of the AvroEncodeFormat.Data class that can be set if it should be more deterministic. Do you have a use case for this?

geomagilles commented 4 years ago

Thx for your quick reply @thake - I've stumbled upon this while writing some unit tests. It was more a surprise to me than a real issue as it was not that difficult to find a workaround. I've raised the issue in case this was the symptom of an underlying problem. If it's not the case, and especially if it comes from the Avro java lib itself, I guess you can mark this as wontfix