charleskorn / kaml

YAML support for kotlinx.serialization
Apache License 2.0
517 stars 50 forks source link

YamlScalar loses type information #303

Closed pschichtel closed 2 years ago

pschichtel commented 2 years ago

Describe the bug

I've written a tiny function to convert YamlNode's into JsonElement's (the stuff I'm reading is always JSON compatible). As part of that I have to convert YamlScalar to JsonPrimitive.

I generally I don't have an issue with kaml's approach to scalars, however it is a little unfortunate that it even loses whether the scalar was a string (as-in: the value was quoted). I'm not sure how much type information snakeyaml profiles, but it would be great to retain as much as possible to make format conversions easier.

My current converter tries to convert the string value from the scalar into the various number types until one of them work. If everything fails I assume it to be a string. However that will lead to scalars like '1' to be interpreted as a number on the JSON side, even through it was originally explicitly quoted.

Reproduction repo

private inline fun <T : Any> tryConversion(conversion: () -> T): T? {
    return try {
        conversion()
    } catch (e: YamlScalarFormatException) {
        null
    }
}

private fun yamlNodeToJsonElement(yamlNode: YamlNode): JsonElement {
    return when (yamlNode) {
        is YamlList -> JsonArray(yamlNode.items.map(::yamlNodeToJsonElement))
        is YamlMap -> JsonObject(yamlNode.entries.map { (key, value) -> Pair(key.content, yamlNodeToJsonElement(value)) }.toMap())
        is YamlNull -> JsonNull
        is YamlScalar -> {
            tryConversion { JsonPrimitive(yamlNode.toBoolean()) }
                ?: tryConversion { JsonPrimitive(yamlNode.toByte()) }
                ?: tryConversion { JsonPrimitive(yamlNode.toShort()) }
                ?: tryConversion { JsonPrimitive(yamlNode.toInt()) }
                ?: tryConversion { JsonPrimitive(yamlNode.toLong()) }
                ?: tryConversion { JsonPrimitive(yamlNode.toFloat()) }
                ?: tryConversion { JsonPrimitive(yamlNode.toDouble()) }
                ?: JsonPrimitive(yamlNode.content)
        }
        is YamlTaggedNode -> yamlNodeToJsonElement(yamlNode.innerNode)
    }
}

Steps to reproduce

use a document like

some-string: '1'

parse it into a YamlNode and convert it to a JsonElement of the same type.

Expected behaviour

Some why to extract type information from the original yaml document.

Actual behaviour

All type information is lost.

Version information

0.46.0

Any other information

No response

charleskorn commented 2 years ago

kaml is not intended to be a general-purpose YAML parsing library: its job is to provide support for YAML in kotlinx.serialization. If you need detailed information about the structure or format of the source YAML document, I'd encourage you to use a YAML parsing library like SnakeYAML directly.

pschichtel commented 2 years ago

ah ok, so the argument is that since kotlinx.serialization is focused on object mapping, type info would be taken from the object being deserialized. But I wonder if this would make it hard to write e.g. an Any deserializer. JsonPrimitive is essentially similar to YamlScalar, but they have a isString flag that can be supplied to indicate that it really was a string (and a field to check it). all other types have (mostly) disjoint representations.

pschichtel commented 2 years ago

straight from JsonPrimitive's code:

public sealed class JsonPrimitive : JsonElement() {

    /**
     * Indicates whether the primitive was explicitly constructed from [String] and
     * whether it should be serialized as one. E.g. `JsonPrimitive("42")` is represented
     * by a string, while `JsonPrimitive(42)` is not.
     * These primitives will be serialized as `42` and `"42"` respectively.
     */
    public abstract val isString: Boolean
charleskorn commented 2 years ago

If you'd like to add a PR for isString to YamlScalar, I'd be open to that.

However, I suspect you'll run into other similar issues for other scenarios, so something like SnakeYAML will likely be better suited to your use case.

pschichtel commented 2 years ago

Yeah I definitely agree that directly using SnakeYAML is probably the better approach for my use-case, but the improvement might still be nice.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. It will automatically be closed if no further activity occurs in the next seven days to enable maintainers to focus on the most important issues. If this issue is still affecting you, please comment below within the next seven days. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had any recent activity.

OptimumCode commented 1 month ago

Hi, @charleskorn. I was investigating the possibility of adding the isString property to YamlScalar (as you mentioned in the comment https://github.com/charleskorn/kaml/issues/303#issuecomment-1176094242 you would not mind adding it).

However, I don't see a good way to add it without breaking backward compatibility. YamlScalar is a data class meaning any new property added to the constructor might cause problems. For example, when I add the property like this

public data class YamlScalar(
    val content: String,
    override val path: YamlPath,
    /**
     * Returns `true` if the [YamlScalar] **definitely** represents a string.
     * Conditions are:
     * * content was quoted (single-quoted or double-quoted)
     * * content was defined using literal or folded block style
     *
     * In all other cases returns `false`.
     */
    val isString: Boolean = true,
) : YamlNode(path) {}

27 tests fail. When I use false as the default value - 34 tests fail.

I can correct the tests but if somebody is constructing YamlNodes manually they might have the same problem...

Could you please take a look at the problem when you have time - maybe you have a better solution? Any option I have in mind breaks the compatibility (because of the data modifier for the class)

pschichtel commented 1 month ago

I've never actually looked into it. I switched to snakeyaml.

OptimumCode commented 1 month ago

I've never actually looked into it. I switched to snakeyaml.

This is the only option I see at the moment too

Vampire commented 1 month ago

Well, it probably is expected that this is a breaking change.

Besides that data classes in API are always a bad idea according to https://kotlinlang.org/docs/api-guidelines-backward-compatibility.html#avoid-using-data-classes-in-your-api and for modifications need much additional boilerplate that void the benefits of the data class to stay backwards compatible, in this case even if no data class was used, I'd expect this to be a breaking change.

Right now the YamlScalar looses the information whether the content is a string or not, so right now things that should not be equal are equal. By adding the missing information, I'd say it is expected that things that were considered equal before are not equal anymore. But while technically breaking, this is actually fixing this bug here. :-)