Open kubukoz opened 2 years ago
I believe this has something to do with recursion in the schema, ~potentially~ within the Property
type. Tried to minimize this further but it looks like some of the pieces interact in some bizarre way 😅
Currently, setMark()/rollbackToMark()
cannot be called recursively, because jsoniter-scala holds only one mark index. Usually they are used for scanning for some hint keys before main parsing in codecs derived by macros or for detection of numeric value types for schema less codecs.
Currently, setMark()/rollbackToMark() cannot be called recursively,
yeah I hadn't foreseen the usage of untagged unions (which requires marking/rollback) with recursive definitions. Is there any existing primitive that could be leveraged to handle the marking on our side ? (I presume it's not the case as it would potentially be unsafe with regards to the reader's state machine)
@kubukoz, regardless of the issue (which is very much a bug), why are you using untagged
here ? It feels like you should be using discriminated
Currently, setMark()/rollbackToMark() cannot be called recursively,
yeah I hadn't foreseen the usage of untagged unions (which requires marking/rollback) with recursive definitions. Is there any existing primitive that could be leveraged to handle the marking on our side ? (I presume it's not the case as it would potentially be unsafe with regards to the reader's state machine)
I can add a pair of methods like def setAndGetMark(): Int
and def rollbackToMark(mark: Int)
but it would be too inefficient for recursive usage. That can introduce a DoS vulnerability for systems with an untrusted input.
For more context, this problem is bound to happen only for unions annotated with the untagged
trait. This trait is there to offer compatibility with existing APIs and patterns and is something that should be discouraged for new APIs. Tagged unions are default in smithy and have my preference because of how they don't require rollback during parsing.
Say you have this smithy union :
@untagged
union IntOrString {
int: Integer,
string: String
}
The expected json would look like this, as indicated by the presence of untagged
:
123
In other words, untagged
indicates that there is strictly no way of discriminating between the Integer
and String
shapes ahead of time. Therefore, the only way I'm aware of to generically process an untagged union is to attempt parsing against the first member (here Integer), and if that fails, rollback and attempt parsing against the second member (String).
This is certainly not great, but due to the existence of APIs that use this pattern, supporting it is somewhat required. Now the problem is : how do we make it work with recursive unions ? Disregard the unrealism of the following example, but something like this :
structure IntCell {
@required
head: Int,
@required
tail: IntList
}
@untagged
untagged IntList {
some: IntCell
none: Unit
}
valid json payloads for this definition would look like :
{ "head" : 1, "tail": { "head" : 2, "tail" : {}}
So what I think is this : if jsoniter can expose a mechanism to support several marks, then that's cool and we use it to fix the bug. BUT we add a warning in smithy4s to warn the user against using untagged
with recursive unions.
If you can think of a way to support this kind of case generically without using marks/rollbacks, I'm also interested 😄
@kubukoz, regardless of the issue (which is very much a bug), why are you using untagged here ? It feels like you should be using discriminated
@Baccata you don't wanna know :joy:
Given this schema:
and this JSON:
Trying to decode the contents of the JSON into a
Schema
...fails with: