Fatal1ty / mashumaro

Fast and well tested serialization library
Apache License 2.0
767 stars 45 forks source link

[BUG] Can't generate expected JSON Schema when using `dict` #235

Closed Future-Outlier closed 2 months ago

Future-Outlier commented 3 months ago

Description

I am a flytekit maintainer and working on the dataclass transformer. I found that when I'm serializing the class Bar below by function from mashumaro.jsonschema import build_json_schema, I will get an error when I want to generate a dict to convert it to a dataclass in the future.

Describe what you were trying to get done. Tell us what happened, what went wrong, and what you expected to happen.

What I Did

from flytekit import task
from flytekit.core.type_engine import TypeEngine
from dataclasses import dataclass
from flytekit.tools.translator import get_serializable
import flytekit
import flytekit.configuration
from flytekit.configuration import Image, ImageConfig
from collections import OrderedDict
# from dataclasses_json import DataClassJsonMixin

# class Foo(DataClassJsonMixin):
class Foo:
    x: int
    y: str
    z: typing.Dict[int, str]

@dataclass
# class Bar(DataClassJsonMixin):
class Bar:
    x: int
    y: dict
    # y: typing.Dict[str, str]
    z: Foo

 serialization_settings = flytekit.configuration.SerializationSettings(
    project="proj",
    domain="dom",
    version="123",
    image_config=ImageConfig(Image(name="name", fqn="asdf/fdsa", tag="123")),
    env={},
)

task_spec = get_serializable(OrderedDict(), serialization_settings, t2)
# print("@@@ task_spec.template.interface.outputs:", task_spec.template.interface.outputs)
pt_map = TypeEngine.guess_python_types(task_spec.template.interface.outputs)

Error Message

(DEV) future@outlier ~ % python PR/dataclass/dataclass.py
╭─────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────╮
│ /Users/future-outlier/code/dev/PR/dataclass/dataclass.py:119 in <module>                                 │
│                                                                                                          │
│ ❱ 119 pt_map = TypeEngine.guess_python_types(task_spec.template.interface.outputs)                       │
│                                                                                                          │
│ /Users/future-outlier/code/dev/flytekit/flytekit/core/type_engine.py:1331 in guess_python_types          │
│                                                                                                          │
│ ❱ 1331 │   │   │   python_types[k] = cls.guess_python_type(v.type)                                       │
│                                                                                                          │
│ /Users/future-outlier/code/dev/flytekit/flytekit/core/type_engine.py:1348 in guess_python_type           │
│                                                                                                          │
│ ❱ 1348 │   │   │   return cls._DATACLASS_TRANSFORMER.guess_python_type(literal_type=flyte_type)          │
│                                                                                                          │
│ /Users/future-outlier/code/dev/flytekit/flytekit/core/type_engine.py:791 in guess_python_type            │
│                                                                                                          │
│ ❱  791 │   │   │   │   │   return convert_mashumaro_json_schema_to_python_class(literal_type.me          │
│                                                                                                          │
│ /Users/future-outlier/code/dev/flytekit/flytekit/core/type_engine.py:1975 in                             │
│ convert_mashumaro_json_schema_to_python_class                                                            │
│                                                                                                          │
│ ❱ 1975 │   attribute_list = generate_attribute_list_from_dataclass_json_mixin(schema, schema_na          │
│                                                                                                          │
│ /Users/future-outlier/code/dev/flytekit/flytekit/core/type_engine.py:923 in                              │
│ generate_attribute_list_from_dataclass_json_mixin                                                        │
│                                                                                                          │
│ ❱  923 │   │   │   │   sub_schemea_name = property_val["title"]                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'title'

The schema I generated by build_json_schema( ).to_dict( )


{
   "type":"object",
   "title":"Bar",
   "properties":{
      "x":{
         "type":"integer"
      },
      "y":{
         "type":"object"
      },
      "z":{
         "type":"object",
         "title":"Foo",
         "properties":{
            "x":{
               "type":"integer"
            },
            "y":{
               "type":"string"
            },
            "z":{
               "type":"object",
               "additionalProperties":{
                  "type":"string"
               },
               "propertyNames":{
                  "type":"string"
               }
            }
         },
         "additionalProperties":false,
         "required":[
            "x",
            "y",
            "z"
         ]
      }
   },
   "additionalProperties":false,
   "required":[
      "x",
      "y",
      "z"
   ]
}
Future-Outlier commented 3 months ago

For more details, this is how we generate dataclass from marshmallow json schema. https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L1898-L1919

We want to do the same thing to mashumaro json schema here, but it doesn't work now. https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L1922-L1932

Future-Outlier commented 3 months ago

In the example above, I found that the "y":{ "type":"object" } can't provide enough information for flytekit to convert the json schema to attribute lists for dataclass transformer.

Fatal1ty commented 2 months ago

In the example above, I found that the "y":{ "type":"object" } can't provide enough information for flytekit to convert the json schema to attribute lists for dataclass transformer.

I'm sure this is because you have a too loose type:

y: dict
Future-Outlier commented 2 months ago

In the example above, I found that the "y":{ "type":"object" } can't provide enough information for flytekit to convert the json schema to attribute lists for dataclass transformer.

I'm sure this is because you have a too loose type:

y: dict

No problem, thank you

Fatal1ty commented 2 months ago

In the example above, I found that the "y":{ "type":"object" } can't provide enough information for flytekit to convert the json schema to attribute lists for dataclass transformer.

I'm sure this is because you have a too loose type:

y: dict

No problem, thank you

Can we close this issue or do you still have questions?

Future-Outlier commented 2 months ago

In the example above, I found that the "y":{ "type":"object" } can't provide enough information for flytekit to convert the json schema to attribute lists for dataclass transformer.

I'm sure this is because you have a too loose type:

y: dict

No problem, thank you

Can we close this issue or do you still have questions?

Nope, thank you so much!