Chuckame commented 7 months ago

What is your use-case and why do you need this feature? There is no official way of reaching the descriptor tree.

In formats with schema like protobuf or avro, we need to read the full descriptors tree from the root serializer to generate the corresponding schemas. This logic could be provided by the kotlin serialization library to easily provide a way of reaching the descriptors.

Describe the solution you'd like Here is my current implementation. It has been made following the same concepts, and is customisable.

What we just need is to implement the different interfaces related to each descriptor's kind or key concepts:

SerialDescriptorValueVisitor to visit a generic value. It's also the entrypoint for all the other visitors
SerialDescriptorMapVisitor when kind is StructureKind.MAP to visit its key and value descriptors
SerialDescriptorListVisitor when kind is StructureKind.LIST to visit its item descriptor
SerialDescriptorPolymorphicVisitor when kind is PolymorphicKind to visit its implementation(s) descriptors
SerialDescriptorClassVisitor when kind is StructureKind.CLASS to visit its fields descriptors
SerialDescriptorInlineClassVisitor when descriptor.isInline is true (same workflow as Encoder.encodeInline)

Note that all interfaces could be implemented by the same class as each method have a different name.

All the methods follow the same logic:

When a value is a scalar (primitive, enum and object kinds), then the visit method returns Unit as we do not need to visit deeper.
When a value is something else (contextual, structure and polymorphic kinds), then the visit method returns the related interface or null if we want to stop the visit.

Here is an image showing all the interfaces and their methods:

And the code:

interface SerialDescriptorValueVisitor {
    val serializersModule: SerializersModule

    /**
     * Called when the [descriptor]'s kind is a [PrimitiveKind].
     */
    fun visitPrimitive(
        descriptor: SerialDescriptor,
        kind: PrimitiveKind,
    )

    /**
     * Called when the [descriptor]'s kind is an [SerialKind.ENUM].
     */
    fun visitEnum(descriptor: SerialDescriptor)

    /**
     * Called when the [descriptor]'s kind is an [StructureKind.OBJECT].
     */
    fun visitObject(descriptor: SerialDescriptor)

    /**
     * Called when the [descriptor]'s kind is a [PolymorphicKind].
     * @return null if we don't want to visit the polymorphic type
     */
    fun visitPolymorphic(
        descriptor: SerialDescriptor,
        kind: PolymorphicKind,
    ): SerialDescriptorPolymorphicVisitor?

    /**
     * Called when the [descriptor]'s kind is a [StructureKind.CLASS].
     * Note that when the [descriptor] is an inline class, [visitInlineClass] is called instead.
     * @return null if we don't want to visit the class
     */
    fun visitClass(descriptor: SerialDescriptor): SerialDescriptorClassVisitor?

    /**
     * Called when the [descriptor]'s kind is a [StructureKind.LIST].
     * @return null if we don't want to visit the list
     */
    fun visitList(descriptor: SerialDescriptor): SerialDescriptorListVisitor?

    /**
     * Called when the [descriptor]'s kind is a [StructureKind.MAP].
     * @return null if we don't want to visit the map
     */
    fun visitMap(descriptor: SerialDescriptor): SerialDescriptorMapVisitor?

    /**
     * Called when the [descriptor] is about a value class (e.g. its kind is a [StructureKind.CLASS] and [SerialDescriptor.isInline] is true).
     * @return null if we don't want to visit the inline class
     */
    fun visitInlineClass(descriptor: SerialDescriptor): SerialDescriptorInlineClassVisitor?

    fun visitValue(descriptor: SerialDescriptor) {
        if (descriptor.isInline) {
            visitInlineClass(descriptor)?.apply {
                visitInlineClassElement(descriptor, 0)?.visitValue(descriptor.getElementDescriptor(0))
            }
        } else {
            when (descriptor.kind) {
                is PrimitiveKind -> visitPrimitive(descriptor, descriptor.kind as PrimitiveKind)
                SerialKind.ENUM -> visitEnum(descriptor)
                SerialKind.CONTEXTUAL -> visitValue(descriptor.getNonNullContextualDescriptor(serializersModule))
                StructureKind.CLASS ->
                    visitClass(descriptor)?.apply {
                        for (elementIndex in (0 until descriptor.elementsCount)) {
                            visitClassElement(descriptor, elementIndex)?.visitValue(descriptor.getElementDescriptor(elementIndex))
                        }
                    }?.endClassVisit(descriptor)

                StructureKind.LIST ->
                    visitList(descriptor)?.apply {
                        visitListItem(descriptor, 0)?.visitValue(descriptor.getElementDescriptor(0))
                    }?.endListVisit(descriptor)

                StructureKind.MAP ->
                    visitMap(descriptor)?.apply {
                        visitMapKey(descriptor, 0)?.visitValue(descriptor.getElementDescriptor(0))
                        visitMapValue(descriptor, 1)?.visitValue(descriptor.getElementDescriptor(1))
                    }?.endMapVisit(descriptor)

                is PolymorphicKind ->
                    visitPolymorphic(descriptor, descriptor.kind as PolymorphicKind)?.apply {
                        descriptor.possibleSerializationSubclasses(serializersModule).sortedBy { it.serialName }.forEach { implementationDescriptor ->
                            visitPolymorphicFoundDescriptor(implementationDescriptor)?.visitValue(implementationDescriptor)
                        }
                    }?.endPolymorphicVisit(descriptor)

                StructureKind.OBJECT -> visitObject(descriptor)
            }
        }
    }
}

interface SerialDescriptorMapVisitor {
    /**
     * @return null if we don't want to visit the map key
     */
    fun visitMapKey(
        mapDescriptor: SerialDescriptor,
        keyElementIndex: Int,
    ): SerialDescriptorValueVisitor?

    /**
     * @return null if we don't want to visit the map value
     */
    fun visitMapValue(
        mapDescriptor: SerialDescriptor,
        valueElementIndex: Int,
    ): SerialDescriptorValueVisitor?

    fun endMapVisit(descriptor: SerialDescriptor)
}

interface SerialDescriptorListVisitor {
    /**
     * @return null if we don't want to visit the list item
     */
    fun visitListItem(
        listDescriptor: SerialDescriptor,
        itemElementIndex: Int,
    ): SerialDescriptorValueVisitor?

    fun endListVisit(descriptor: SerialDescriptor)
}

interface SerialDescriptorPolymorphicVisitor {
    /**
     * @return null if we don't want to visit the found polymorphic descriptor
     */
    fun visitPolymorphicFoundDescriptor(descriptor: SerialDescriptor): SerialDescriptorValueVisitor?

    fun endPolymorphicVisit(descriptor: SerialDescriptor)
}

interface SerialDescriptorClassVisitor {
    /**
     * @return null if we don't want to visit the class element
     */
    fun visitClassElement(
        descriptor: SerialDescriptor,
        elementIndex: Int,
    ): SerialDescriptorValueVisitor?

    fun endClassVisit(descriptor: SerialDescriptor)
}

interface SerialDescriptorInlineClassVisitor {
    /**
     * @return null if we don't want to visit the inline class element
     */
    fun visitInlineClassElement(
        inlineClassDescriptor: SerialDescriptor,
        inlineElementIndex: Int,
    ): SerialDescriptorValueVisitor?
}

private fun SerialDescriptor.getNonNullContextualDescriptor(serializersModule: SerializersModule) =
    requireNotNull(serializersModule.getContextualDescriptor(this) ?: this.capturedKClass?.serializerOrNull()?.descriptor) {
        "No descriptor found in serialization context for $this"
    }

private fun SerialDescriptor.possibleSerializationSubclasses(serializersModule: SerializersModule): Sequence<SerialDescriptor> {
    return when (this.kind) {
        PolymorphicKind.SEALED ->
            elementDescriptors.asSequence()
                .filter { it.kind == SerialKind.CONTEXTUAL }
                .flatMap { it.elementDescriptors }
                .flatMap { it.possibleSerializationSubclasses(serializersModule) }

        PolymorphicKind.OPEN ->
            serializersModule.getPolymorphicDescriptors(this@possibleSerializationSubclasses).asSequence()
                .flatMap { it.possibleSerializationSubclasses(serializersModule) }

        SerialKind.CONTEXTUAL -> sequenceOf(getNonNullContextualDescriptor(serializersModule))

        else -> sequenceOf(this)
    }
}

What do you think ? I can do a PR if needed

sandwwraith commented 7 months ago

Do you have any particular reason to use exactly the Visitor pattern? There are existent APIs that provide the ability to simply iterate over sub-descriptors (e.g., public val SerialDescriptor.elementDescriptors: Iterable<SerialDescriptor>). In my [personal] opinion, the Visitor pattern is outdated now and should be replaced with FP operations on collections and iterables, such as map or filter + when over subtypes or kinds, when necessary. It results in a more concise and compact code with the same meaning — no need to override a bunch of different functions, code can be read top-down without additional navigation, etc. Your own SerialDescriptor.possibleSerializationSubclasses showcases a good example of that. See also a similar ticket in kotlin-metadata-jvm: https://youtrack.jetbrains.com/issue/KT-59442

sandwwraith commented 7 months ago

In any case, this seems like something that can be implemented on top of the kotlinx-serialization-core and even published as an additional utility library when necessary. So it is unlikely that such functionality will be added to the core, but it can be maintained by the community if there's a demand for that.

Chuckame commented 7 months ago

I don't have strong reason of using the visitor pattern. I just wanted something similar to encoders and decoders, to be sure to not forget anything, and to not .

The main idea behind it is not to have visitor pattern, but it's to have a standard way of going through the descriptors without a value (to make schemas, debug, generate reports, ...).

Visitor pattern just became naturally (I'm not that old! 😄). To be honest, at the beginning, I've just copy/pasted the Decoders interfaces and mainly removed the return types. After that, I just reverse-engineered the plugin generated serializers to well understand its workflow.

Btw, jackson library does this visitor stuff internally, and it allows users to easily implement new formats without having to think about how is descripted a map, a list, an class, etc

sandwwraith commented 7 months ago

to have a standard way of going through the descriptors without a value (to make schemas, debug, generate reports, ...).

You can take a look at the protobuf schema generator, it uses SerialDescriptor.elementDescriptors()/getElementDescriptior: https://github.com/Kotlin/kotlinx.serialization/blob/master/formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/schema/ProtoBufSchemaGenerator.kt#L142

Although this API is indeed not showcased anywhere. We likely should add a section to https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/formats.md#custom-formats-experimental with an explanation of how one can write a schema generator for a custom format.

sandwwraith commented 7 months ago

2643

Chuckame commented 7 months ago

We are currently using nearly all the same apis for generating schemas. In my opinion, a class of 500 lines to generate a schema is less readable than a well structured visitor pattern, and it's difficult to check quickly where is generated what depending on its kind or descriptor.

Kotlin / kotlinx.serialization

Enable visiting descriptors-tree #2632

2643