disneystreaming / smithy4s

https://disneystreaming.github.io/smithy4s/
Other
347 stars 69 forks source link

IllegalStateException in json decoding #222

Open kubukoz opened 2 years ago

kubukoz commented 2 years ago

Given this schema:

namespace jsonschema

use smithy4s.api#untagged

structure Schema {
  definitions: Definitions,
  properties: Properties,
  type: SchemaType,
}

map Definitions {
  key: String,
  value: Schema
}

@untagged
union SchemaType {
  array: SchemaArray,
}

structure SchemaArray {
  member: Schema
}

map Properties {
  key: String,
  value: Property
}

@untagged
union Property {
  schema: Schema,
}

and this JSON:

{
  "definitions": {
    "AndType": {
      "properties": {
        "items": {
          "type": "array"
        }
      }
    }
  }
}

Trying to decode the contents of the JSON into a Schema...

val capi = smithy4s.http.json.codecs()
val codec = capi.compileCodec(jsonschema.Schema.schema)
val schema = capi.decodeFromByteArray(codec, bytesOfFile).toTry.get

fails with:

[error] java.lang.IllegalStateException: expected preceding call of 'setMark()'
[error]         at com.github.plokhotnyuk.jsoniter_scala.core.JsonReader.missingSetMarkOperation(JsonReader.scala:826)
[error]         at com.github.plokhotnyuk.jsoniter_scala.core.JsonReader.rollbackToMark(JsonReader.scala:74)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$20.decodeValue(SchematicJCodec.scala:565)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$17.decodeValue(SchematicJCodec.scala:453)
[error]         at smithy4s.http.json.Cursor.decode(Cursor.scala:31)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$16.$anonfun$decodeValue$2(SchematicJCodec.scala:393)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:37)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:45)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$16.decodeValue(SchematicJCodec.scala:393)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$16.decodeValue(SchematicJCodec.scala:374)
[error]         at smithy4s.http.json.Cursor.decode(Cursor.scala:31)
[error]         at smithy4s.http.json.SchematicJCodec.$anonfun$fieldHandler$4(SchematicJCodec.scala:737)
[error]         at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:37)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:43)
[error]         at smithy4s.http.json.SchematicJCodec.$anonfun$fieldHandler$3(SchematicJCodec.scala:733)
[error]         at smithy4s.http.json.SchematicJCodec.$anonfun$fieldHandler$3$adapted(SchematicJCodec.scala:731)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$26.decodeValue_(SchematicJCodec.scala:822)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$26.decodeValue(SchematicJCodec.scala:803)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$17.decodeValue(SchematicJCodec.scala:453)
[error]         at smithy4s.http.json.Cursor.decode(Cursor.scala:31)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$16.$anonfun$decodeValue$2(SchematicJCodec.scala:393)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:37)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:45)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$16.decodeValue(SchematicJCodec.scala:393)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$16.decodeValue(SchematicJCodec.scala:374)
[error]         at smithy4s.http.json.Cursor.decode(Cursor.scala:31)
[error]         at smithy4s.http.json.SchematicJCodec.$anonfun$fieldHandler$4(SchematicJCodec.scala:737)
[error]         at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:37)
[error]         at smithy4s.http.json.Cursor.under(Cursor.scala:43)
[error]         at smithy4s.http.json.SchematicJCodec.$anonfun$fieldHandler$3(SchematicJCodec.scala:733)
[error]         at smithy4s.http.json.SchematicJCodec.$anonfun$fieldHandler$3$adapted(SchematicJCodec.scala:731)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$26.decodeValue_(SchematicJCodec.scala:822)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$26.decodeValue(SchematicJCodec.scala:803)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$17.decodeValue(SchematicJCodec.scala:453)
[error]         at smithy4s.http.json.JCodec.$anonfun$decodeMessage$1(JCodec.scala:68)
[error]         at smithy4s.http.json.Cursor$.withCursor(Cursor.scala:75)
[error]         at smithy4s.http.json.JCodec.decodeMessage(JCodec.scala:67)
[error]         at smithy4s.http.json.JCodec.decodeMessage$(JCodec.scala:66)
[error]         at smithy4s.http.json.SchematicJCodec$$anon$17.decodeMessage(SchematicJCodec.scala:447)
[error]         at smithy4s.http.json.JCodec$$anon$1.decodeValue(JCodec.scala:57)
[error]         at smithy4s.http.json.JCodec$$anon$1.decodeValue(JCodec.scala:52)
[error]         at com.github.plokhotnyuk.jsoniter_scala.core.JsonReader.read(JsonReader.scala:580)
[error]         at com.github.plokhotnyuk.jsoniter_scala.core.package$.readFromArray(package.scala:99)
[error]         at smithy4s.http.json.JsonCodecAPI.decodeFromByteArrayPartial(JsonCodecAPI.scala:50)
[error]         at smithy4s.http.json.JsonCodecAPI.decodeFromByteArrayPartial(JsonCodecAPI.scala:28)
[error]         at smithy4s.http.CodecAPI.decodeFromByteArray(CodecAPI.scala:62)
[error]         at smithy4s.http.CodecAPI.decodeFromByteArray$(CodecAPI.scala:58)
[error]         at smithy4s.http.json.JsonCodecAPI.decodeFromByteArray(JsonCodecAPI.scala:28)
[error]         at playground.cli.WIP$.exec(WIP.scala:16)
[error]         at playground.cli.WIP$.delayedEndpoint$playground$cli$WIP$1(WIP.scala:25)
[error]         at playground.cli.WIP$delayedInit$body.apply(WIP.scala)
[error]         at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error]         at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error]         at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error]         at scala.App.$anonfun$main$1(App.scala:76)
[error]         at scala.App.$anonfun$main$1$adapted(App.scala:76)
[error]         at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
[error]         at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
[error]         at scala.collection.AbstractIterable.foreach(Iterable.scala:926)
[error]         at scala.App.main(App.scala:76)
[error]         at scala.App.main$(App.scala:74)
[error]         at playground.cli.WIP$.main(WIP.scala)
[error]         at playground.cli.WIP.main(WIP.scala)
kubukoz commented 2 years ago

I believe this has something to do with recursion in the schema, ~potentially~ within the Property type. Tried to minimize this further but it looks like some of the pieces interact in some bizarre way 😅

plokhotnyuk commented 2 years ago

Currently, setMark()/rollbackToMark() cannot be called recursively, because jsoniter-scala holds only one mark index. Usually they are used for scanning for some hint keys before main parsing in codecs derived by macros or for detection of numeric value types for schema less codecs.

Baccata commented 2 years ago

Currently, setMark()/rollbackToMark() cannot be called recursively,

yeah I hadn't foreseen the usage of untagged unions (which requires marking/rollback) with recursive definitions. Is there any existing primitive that could be leveraged to handle the marking on our side ? (I presume it's not the case as it would potentially be unsafe with regards to the reader's state machine)

Baccata commented 2 years ago

@kubukoz, regardless of the issue (which is very much a bug), why are you using untagged here ? It feels like you should be using discriminated

plokhotnyuk commented 2 years ago

Currently, setMark()/rollbackToMark() cannot be called recursively,

yeah I hadn't foreseen the usage of untagged unions (which requires marking/rollback) with recursive definitions. Is there any existing primitive that could be leveraged to handle the marking on our side ? (I presume it's not the case as it would potentially be unsafe with regards to the reader's state machine)

I can add a pair of methods like def setAndGetMark(): Int and def rollbackToMark(mark: Int) but it would be too inefficient for recursive usage. That can introduce a DoS vulnerability for systems with an untrusted input.

Baccata commented 2 years ago

For more context, this problem is bound to happen only for unions annotated with the untagged trait. This trait is there to offer compatibility with existing APIs and patterns and is something that should be discouraged for new APIs. Tagged unions are default in smithy and have my preference because of how they don't require rollback during parsing.

Say you have this smithy union :

@untagged 
union IntOrString {
  int: Integer, 
  string: String 
}

The expected json would look like this, as indicated by the presence of untagged :

In other words, untagged indicates that there is strictly no way of discriminating between the Integer and String shapes ahead of time. Therefore, the only way I'm aware of to generically process an untagged union is to attempt parsing against the first member (here Integer), and if that fails, rollback and attempt parsing against the second member (String).

This is certainly not great, but due to the existence of APIs that use this pattern, supporting it is somewhat required. Now the problem is : how do we make it work with recursive unions ? Disregard the unrealism of the following example, but something like this :

structure IntCell {
   @required 
   head: Int, 
   @required
   tail: IntList 
}

@untagged 
untagged IntList {
   some: IntCell 
   none: Unit 
} 

valid json payloads for this definition would look like :

{ "head" : 1, "tail": { "head" : 2, "tail" : {}} 

So what I think is this : if jsoniter can expose a mechanism to support several marks, then that's cool and we use it to fix the bug. BUT we add a warning in smithy4s to warn the user against using untagged with recursive unions.

If you can think of a way to support this kind of case generically without using marks/rollbacks, I'm also interested 😄

kubukoz commented 2 years ago

@kubukoz, regardless of the issue (which is very much a bug), why are you using untagged here ? It feels like you should be using discriminated

@Baccata you don't wanna know :joy: