avro-kotlin / avro4k

Avro format support for Kotlin
Apache License 2.0
188 stars 36 forks source link

properties with list of values #152

Closed potfur closed 10 months ago

potfur commented 11 months ago

Avro AVSC allows for custom annotations as shown here.

Same functionality can be partially achieved by using @AvroProp annotation, but only for single value annotations.

For cases - like the linked issue, where a list of values is needed, existing annotation is not enough.

Therefore I'm suggesting adding @AvropProps("name", "valueA", "valueB", "etc")

franckrasolo commented 10 months ago

Hey @thake & @sksamuel 👋

Is there any chance this could be reviewed/approved at some point?

Chuckame commented 10 months ago

Hey, I'll take a look

franckrasolo commented 10 months ago

Thanks @Chuckame, much appreciated!

thake commented 10 months ago

@potfur, thank you very much for the PR and sorry for the delay.

Wouldn't it be enough to just change @AvroProp to be @Repeatable? This would raise the minimal kotlin level to 1.6 which is acceptable. Code:

@SerialInfo
@Target(AnnotationTarget.PROPERTY, AnnotationTarget.CLASS)
@Repeatable
annotation class AvroProp(val key: String, val value: String)

//Example usage
@Serializable
@AvroProp("cold", "play")
@AvroProp("some", "thing")
data class TypeAnnotated(val str: String)

Another option which does not require kotlin 1.6 is to define the new annotation @AvroProp like this:

@SerialInfo
@Target(AnnotationTarget.PROPERTY, AnnotationTarget.CLASS)
annotation class AvroProps(vararg val props: AvroProp)

//Example usage
@Serializable
@AvroProps(
  AvroProp("cold", "play"),
  AvroProp("some", "thing")
)
data class TypeAnnotated(val str: String)

I'm in favor of the first solution as this is a classic example of why repeatable annotations have been introduced. But I would like to hear your opinion on it.

potfur commented 10 months ago

The problem that I'm trying to solve is to be able to define

"tags": ["val1", "val2"]

From what I know, repeating annotation will result in something like:

"tag1": "val1"
"tag2": "val2"

So, it's about being able to have list of values for key instead multiple keys with single value.

Chuckame commented 10 months ago

With the comment of @thake it would result to something like:

@AvroProp("cold", "play")
@AvroProp("cold", "some")
@AvroProp("cold", "thing")
data class TypeAnnotated(val str: String)

Personally I don't really like it as it feels really verbose.

Chuckame commented 10 months ago

Or, we could take into the middle: Just modify AvroProp to accept multiple values using vararg or array. (breaking change, but cleaner)

thake commented 10 months ago

@potfur, thanks for fixing my confusion ;)

I'm totally with @Chuckame, the breaking change seems to be legit. @potfur can you also add the repeatable annotation?

@SerialInfo
@Target(AnnotationTarget.PROPERTY, AnnotationTarget.CLASS)
@Repeatable
annotation class AvroProp(val key: String, vararg val values: String)
potfur commented 10 months ago

If breaking change is way to go - even better.

Chuckame commented 10 months ago

But an near invisible breaking change since it's not changing the type but only the name (I think it's not common to call this annotation with class' property names 🤞 ) so users will continue to call like this AvroProp("key", "value").

But let's make a major increase when releasing this PR.

Chuckame commented 10 months ago

Oh we missed something important @thake , that prevent us to have like a common AvroProp with single value and list value: We can have additional props like this "key": "value" or "key": ["value"]. How we can differentiate list and non-list added props ?

So new idea, we keep AvroProp having a single value (no vararg so it remains untouched) and another AvroListProp (same as the previously AvroProps) :

annotation class AvroListProp(val key: String, values: Array<String>)
potfur commented 10 months ago

Something like this (below) will work nicely, but has it's own limitations.

@SerialInfo
@Target(AnnotationTarget.PROPERTY, AnnotationTarget.CLASS)
annotation class AvroProp(val key: String, val value: String = "", val values: Array<String> = [])

Mainly non-blank value must take precendence over values, otherwise it will end up in situation where same prop is declared twice and AvroRuntimeException: Can't overwrite property: cold is being thrown. Also is slightly error prone, as the annotation allows for putting both value and values at once without any complaint.

So maybe, let's go back to my initial proposal with minor improvements.

@SerialInfo
@Target(AnnotationTarget.PROPERTY, AnnotationTarget.CLASS)
annotation class AvroProp(val key: String, val value: String)

@SerialInfo
@Target(AnnotationTarget.PROPERTY, AnnotationTarget.CLASS)
annotation class AvroProps(val key: String, val values: Array<String>)

As for AvroListProp vs AvroProps, I think AvroProps is consistent with already existing annotations - AvroAlias & AvroAliases

Chuckame commented 10 months ago

Let's go to your initial proposal yes. Just the name, I would prefer AvroListProp against AvroProps, since AvroProps is ambiguous, like "is it a list of props". Aliases is different where we should fix it and remove it prior to a AvroAlias with a vararg value. I'll create an issue to track it !

thake commented 10 months ago

I'll have a look into it tomorrow.

Chuckame commented 10 months ago

@thake it has been changed to AvroListProp 👌

thake commented 10 months ago

@thake, @potfur thanks for your patience. I rechecked the avro spec on this topic and while the current proposal would solve the discrepancy for array values, the specification explicitly allows additional properties of any type (i.e., boolean, number, object, array, string, null). Maybe we should consider this for this PR.

We have a similar situation with the @AvroDefault annotation. Maybe we can apply the same solution so that the API feels coherent.

Proposal:

@SerialInfo
@Target(AnnotationTarget.PROPERTY, AnnotationTarget.CLASS)
@Repeatable
annotation class AvroJsonProp(
    val key: String,
    val jsonValue: String)

//Usage
data class TypeAnnotated(
    @AvroJsonProp("array", """["play","blub"]""")
    @AvroJsonProp("obj", """{"javascript":"object"}""")
    val str: String
)
Chuckame commented 10 months ago

@thake I'm not really a fan of this, since json in strings are source of errors and ambiguous, and regarding performances it requires parsing the value each time with ser/deser. We will have to make a cache etc...

Why not just having a supplier class, to have a typesafe implementation ? See the following :

annotation class AvroGenericProp(
    val key: String,
    val valueSupplier: KClass<out AvroPropSupplier<*>>,
)

interface AvroPropSupplier<T> {
    val serializer: SerializationStrategy<T>
    fun getValue(): T
}

//usage
data class TypeAnnotated(
    @AvroGenericProp("array", CustomPropSupplier::class)
    val str: String
)

@Serializable
data class CustomObject(val field1: String)

object CustomPropSupplier : AvroPropSupplier<CustomObject> {
    override val serializer = CustomObject.serializer()
    override fun getValue() = CustomObject("value")
}

That way we are sure that we can put whatever we want as inside the field value, it can be static or we can pass some context to the getValue (like the annotated field's value, the parent class, etc...). And it's not so complex to implement, what do you think ?

potfur commented 10 months ago

I followed @thake suggestion, as it is more as Avro actually suggests with handling this case, and it also brings bit more flexibility.

About @Repeatable and type safety - I think for those two separate PR would be better, as those would affect also other annotations - just to be consistent, otherwise it would be weird.

Chuckame commented 10 months ago

I don't understand, avro only suggest to handle all the types in additional props. Why we continue using json while we have pure kotlin to write objects ? 🤔

potfur commented 10 months ago

@Chuckame I get your point, and for sure it would be better to use proper types.

On the other hand, the AvroDefault annotation already uses JSON strings, so AvroJsonProp is aligned and consistent what's already there (I like being consistent, even when it is suboptimal - gives some predictability and less cognitive load).

IMHO, doing improvements on types, and adding Repeatable, would be separate work - especially that forcing types would be a breaking change. So far, all is still backwards compatible - introduces new capability while preserving present.

Chuckame commented 10 months ago

I have exactly the same concern about AvroDefault btw 🥲

Let's do AvroJsonProp, but it won't be that performant.

I'll create an issue to discuss about "type-safe" AvroDefault, AvroJsonProp and other jsons annotations.

franckrasolo commented 10 months ago

Given the backward-compatible nature of these changes, could we get a new release out once @thake has approved this PR?

thake commented 10 months ago

I have exactly the same concern about AvroDefault btw 🥲

Let's do AvroJsonProp, but it won't be that performant.

I'll create an issue to discuss about "type-safe" AvroDefault, AvroJsonProp and other jsons annotations.

It is a kind of trade-off between "ease of use" in combination with pre-existing schemas and type safety. Using JSON in the annotations makes it easy to model your kotlin code so that it matches an existing schema; you just copy and paste the value from the asvc schema file.

Performance-wise, it shouldn't impact serialization and deserialization. The annotations' JSON is only interpreted whenever a schema is derived from the kotlin class. The schema is usually only derived once for every serialized class (responsibility of the user) and not on every serialization/deserialization.

Chuckame commented 10 months ago

@thake Since we'll have the @Language annotation to validate the content, it's an acceptable trade-off 🚀

For the performances, If we use the Avro method toRecord(serializer, obj): GenericRecord then we regenerate the schema each time we serialize. Maybe we could add some doc to advice the use of toRecord(serializer, schema, obj) in an intensive serialization of the same schema.

Chuckame commented 10 months ago

FYI, I'm playing with @Repeatable with kotlinx serialization, and it's only taking the latest annotated value, even if it should be already fixed. See https://github.com/Kotlin/kotlinx.serialization/issues/2099, it works only with kotlin 1.9 I don't know why

franckrasolo commented 10 months ago

@Chuckame / @thake Alas @Language("json") applied to @AvroJsonProp appears to only take effect in IntelliJ IDEA within the avro-kotlin/avro4k project itself, not when it's consumed from another project.

There are a number of open issues such as KTIJ-16874 and IDEA-173477 related to the processing of @Language as a meta-annotation. You may also want to vote them up.

Chuckame commented 10 months ago

@franckrasolo as I can read, both are related not to our case: we are directly using this language annotation on the value parameter. Those issues are related to meta annotations, implying that annotations (like @Language) are annotating an annotation A, where this annotation A is used on a parameter, so all the annotations on A will be applied to the parameter.

Also tested AvroJsonProp on my computer and the json language injection is not working. After a quick check, tldr: replace @Language("json") by @Language("JSON"). Longer story: language injection is done by intellij, and we can find all the languages in Parameters > Editor > Language injections. We should set a value found inside the Language column, where we can only find JSON and not json, maybe it's case sensitive! I'll do the pr

franckrasolo commented 10 months ago

Yes, not an exact match, but the number of issues makes me wonder how stable the processing of @Language is across a wide range of use cases: host languages x injected languages x IDE flavours, and so forth.