Open Vampire opened 1 month ago
Hi, @Vampire. I appreciate your interest in the library.
I had a similar use case where I had YAML files and JSON schema and needed to validate those YAML files (and it was a multiplatform project). I ended up converting YamlNode
to JsonElement
(will leave the code here just for reference)
I need to think about how to make this possible without breaking the API much and without ending up with two different paths for JsonElement
and YamlNode
.
And probably, it also should be possible to load JSON schema defined in YAML format - I saw some cases like that
Actually I tried to do something similar but hit the problem that YamlScalar
lost type information, so you for example don't know whether you had 1
or '1'
in the YAML. :-(
Maybe one viable option could be to slightly broaden the scope of this lib and support validating Kotlin object tree of Map, List, and so on.
What I do right now is using https://github.com/krzema12/snakeyaml-engine-kmp (which kaml uses under the hood too) to parse/load the file, then transform the result to JsonElement
and give it to your lib.
But would be awesome if I could give the load result directly even if it is not coming from KxS.
You could either transform to JsonElement
and use a common code path, or transform JsonElement
to the stdlib classes and use those in the common code path. That would also allow to make the JSON dependency optional / a feature variant.
the problem that YamlScalar lost type information, so you for example don't know whether you had 1 or '1' in the YAML
I think the problem might be partially solved if the kaml (and probably SnakeKMP - I think it already does) would expose the information about ScalarStyle and block style. In this case, there would be a way to identify if the scalar is 100% a string when it has the ScalarStyle other than Plain or has a block style (am I missing something here?).
But I have some doubts about block style. For example, is it a valid number?
key: >
10
However, it will also require providing some API to ask the element: can you be a number? can you be a boolean?
This part should be implemented here, inside the validation library, - currently, it relies on isString
property for the JsonPrimitive
class to check whether it can be a number/boolean and only after that tries to parse the value as the corresponding type.
If my understanding is correct, for a YAML scalar it would be like:
@Vampire what do you think of the above? Am I missing something? @krzema12 I would much appreciate you could also take a look at the above - you should have more experience in YAML than I do
I'm not an expert in YAML (although I tend to work on projects touching YAML 😅). From my experience, when I was trying to fix various YAML-related issues, I played with various modes of emitting the values in kaml/snakeyaml-engine. That's why I'm afraid certain assumptions may be valid for most cases, but invalid for some edge cases.
Unfortunately I don't have enough capacity to dive deeper into this problem now, sorry! Maybe @aSemy has some thoughts as he ported a much bigger portion of snakeyaml-engine to snakeyaml-engine-kmp.
kaml would probably need isString
exposed on YamlScalar
, which it could get from snakeyaml-engine-kmp
already.
kaml's maintainer would be open to a PR adding that according to https://github.com/charleskorn/kaml/issues/303#issuecomment-1176094242.
Then the YamlScalar
could probably be transformed correctly to JsonPrimitive
or the stdlib types.
For now I switched to using snakeyaml-engine-kmp
directly to parse the YAML and then transforming the result to JsonElement
, though it would of course be nice to just give in the parse result as mentioned above. :-)
The parsing is done using Load().loadOne(types).toJsonElement()
with toJsonElement
being
private fun Any?.toJsonElement(): JsonElement {
return when (this) {
is Map<*, *> -> JsonObject(entries.associate { (key, value) -> "$key" to value.toJsonElement() })
is List<*> -> JsonArray(map { it.toJsonElement() })
is Set<*> -> JsonArray(map { it.toJsonElement() })
is Boolean -> JsonPrimitive(this)
is Number -> JsonPrimitive(this)
is String -> JsonPrimitive(this)
is ByteArray -> JsonPrimitive(encodeBase64())
null -> JsonNull
else -> error("Unexpected type: ${this::class.qualifiedName}")
}
}
I've sent the whole yaml-test-suite
through this and all YAML files that did not produce a parse error were transformed to JsonElement
without "Unexpected type" error.
Coming from snakeyaml-engine-kmp
, the Number
instances are only either Int
or Double
.
The only thing in YAML it seems that has no match in JSON is the byte array which in JSON then just is Base64 encoded String according to the yaml-test-suite
.
So if you would indeed go the "slightly broaden scope" approach mentioned above, the types from my when
should be the ones you would check instead of the Json...
types and what you would transform the Json...
types and Yaml...
types to for the common codepath. :-)
UPD: corrected the links
Hi, @Vampire. I would much appreciate it if you had a chance to look at a few files in the PR from an end-user perspective - I am not the one who uses the library and might be missing something:
I think that should be enough to support YAML (once kaml exposes everything needed) and also to validate the result of SnakeKMP parser (from your example above).
To be honest, I don't see any actual way to support the Kaml at the moment - an attempt to add isString
property to scalar breaks backward compatibility (I did not figure out an option that will not break any code...)
What I can do is create a new module in this repo with its own YamlNode
hierarchy - does not look like a lot of code if I use SnakeKMP as a parser. In this case, all required information will be provided and validation will be possible.
Alternative way (the only one I can see right now) - rework the YamlNode
in Kaml project:
This will definitely break someone else's code but after that, it should be much simpler to make changes to the YamlNode
s
The abstraction interface will be available in 0.3.0-SNAPSHOT
version
Hi, @Vampire. I would much appreciate it if you had a chance to look at a few files in the PR from an end-user perspective - I am not the one who uses the library and might be missing something:
I've had a cursory look.
I'm not a Kotlin expert, so maybe I just don't have the right perspective how you build good and idiomatic API in Kotlin.
Actually, to me it seems strange that null
is represented by PrimitiveElement
.
Also, that you have val number: Number?
but then semantically restricting it to only return a Long
or Double
or null
.
Also why at all is there content
and isNull
and contentOrNull
? If you want to represent null
with PrimitiveElement
, why not simply making content
nullable and removing the other two?
I also wonder why ObjectElement
does not simply extend Map<String, AbstractElement>
, but you might have your reasons.
Or why ArrayElement
is not extending Array<AbstractElement>
or Iterable<AbstractElement>
or whatever is appropriate.
But again, I'm not a Kotlin expert and probably just don't know how you idiomatically design an API in Kotlin. Maybe @krzema12 has better feedback. 🤷♂️
To be honest, I don't see any actual way to support the Kaml at the moment - an attempt to add isString property to scalar breaks backward compatibility (I did not figure out an option that will not break any code...)
Well, yeah, that's the price you have to pay when you put data classes into your API.
But as the maintainer was open to have isString
added, maybe he is ok with it being a breaking change?
Actually, you usually can do it in a non-breaking way, it is just not really convenient.
You need to add it as last constructor argument so that the componentX
methods do not break behaviourally or binary, add a secondary constructor with the signature of the old constructor, and add a copy method with the old signature.
All this voids the benefits you get from a data class and thus it is a bad idea to have it in your API.
rework the YamlNode in Kaml project
Well, you have to talk to kaml maintainer what he would prefer. :-) Maybe he also simply is fine with a breaking change, who knows.
Thank you for taking the time to check the changes @Vampire.
Also, that you have val number: Number? but then semantically restricting it to only return a Long or Double or null
The more I look at this property the more I agree with you. Looks a bit strange. The original idea was to have longOrNull
and doubleOrNull
properties and use them internally to achieve the same behaviour as provided by number
property. Probably, I should return to that original idea (while the interface has experimental annotation) - at least it is not as confusing as the current option.
Also why at all is there content and isNull and contentOrNull? If you want to represent null with PrimitiveElement, why not simply making content nullable and removing the other two?
Did not think about it. Maybe you are right and it will look better (1 property instead of 2)
Actually, to me it seems strange that null is represented by PrimitiveElement.
The idea was to make it similar to JsonElement
hierarchy - it consists of 3 main classes JsonObject, JsonArray and JsonPrimiteve. JsonPrimitive consists of JsonNull and JsonLiteral but this additional hierarchy seemed to be a bit redundant here - having the additional class requires additional checks in most of the JSON schema assertions. So, I decided to hide null
and primitive values behind a single type
I also wonder why ObjectElement does not simply extend Map<String, AbstractElement>, but you might have your reasons.
The only reason for that is Map
interface has 8 methods/properties that should be implemented (3 of them are not needed at all).
Or why ArrayElement is not extending Array
or Iterable or whatever is appropriate.
Unfortunately, the Array
class is final. Iterable
interface is exactly the same as Sequence
interface (at least at the moment). I chose the newer one but ArrayElement
can probably extend both interfaces - it shouldn't be a problem
With regards to the Kaml - the main problem is the functional compatibility. By adding a new property to the data class you also change the equals
and hashCode
methods. It means that if two YamlNodes were equal before they might not be equal after that change (in Kaml itself 27 tests fail if I use a default value true
or 34 tests fail if I use false
- https://github.com/charleskorn/kaml/issues/303#issuecomment-2322885579)
Ah, I see.
Well yeah, before the scalar node did not know whether something originally was string or not, so naturally things were equal that are not equal anymore as the type does no longer loose that information.
I guess that is and has to be accepted when adding isString
property which the maintainer was open to. 🤷♂️
We'll see what the owner answers.
Please, be aware that the number
property was split into two properties: longOrNull
and doubleOrNull
Oh, I just found another potential sub-task / integration. :-)
SnakeYaml-KMP can not only give List
, Set
, and so on.
It also has the class Compose
with which you can instead parse into a node hierarchy similar to KAMLs nodes, but with proper information I think, at least this suggests it: https://groovyconsole.dev/?g=groovy_4_0&codez=eNqVjLsOwjAMRfd8RdTFjdRaPDbEgMTAwsYIDAEsFJo4VZqC4OtJ1YLKiBff5Pjc1SboUw4mYhVe5Aw3lVk0rCt6amdL4qthKitXl7e7W8xxgjNQwrjahyh_LPxa2FuYLNS1Qesflu5kce2T19Cf-tbry45iNHxthGB6yKEnHxM8tcZeKOSqT7lSeB7uAEDINKWcDjubZn0a3txa-0FdzkRyFEa_iyG1p9JAtdXn1LWHQsL-wN01jP6LZQeKHizH5NiBAx9BvQFWXXOk :-)
Thank you for sharing this with me @Vampire. I think this should be a part of this sub-task: https://github.com/OptimumCode/json-schema-validator/issues/195
Is there an existing issue for this?
Enhancement description
Like the built-in JSON format, there is https://github.com/charleskorn/kaml for parsing YAML using KxS. Like the
JsonElement
of the JSON format, there isYamlNode
there. Would be very nice if you could support validating those too.Sub-tasks
YamlScalar
node to identify whether it is a string or not (currently impossible without breaking the API)