#92656 (native JSON types) does not support custom objects

neth392 commented 2 months ago

Tested versions

4.4-dev2

System information

n/a

Issue description

Per the request of @fire from PR #92656

The new JSON methods from_native and to_native do not appear to work with Objects of user created classes. The code I used to test this is in the steps to reproduce.

I could be wrong, but I believe the underlying issue is that the entire script is being serialized, and when it is loaded it is trying to override the existing script thus causing a class name conflict. I do not see any reason a developer would want to serialize the entire source code of an object's script into JSON. Many developers use JSON to avoid this security issue that exists in Resources currently.

It could make more sense to simply serialize the script's path, and load it to create a new instance of the object. But I don't know if that's the best solution as projects get refactored and saves need to be preserved. I handled it a bit differently for that reason in my WIP JSON addon, by allowing devs to define "configurations" for each object type which contain the script/scene that should be instantiated & a unique ID for that configuration which is what is stored in the JSON.

But besides that thanks to those who put their time in for this PR, definitely great for many devs when it comes to the rest of native types

Steps to reproduce

MyClass extends Resource and has 1 simple string var.

var my_class: MyClass = MyClass.new()
var json: String = JSON.stringify(JSON.from_native(my_class, true, false))
# Prints: # {"__gdtype":"Object","properties":{"resource_local_to_scene":false,"resource_name":"","serialize_this":"test"},"type":"Resource"}
print(json, "\n")
var deserialized: Variant = JSON.to_native(JSON.parse_string(json), true, false)
# Prints: false
print(deserialized is MyClass)

# Testing with the script param as true
var json_with_script: String = JSON.stringify(JSON.from_native(my_class, true, true))
# Prints:
# {"__gdtype":"Object","properties":{"resource_local_to_scene":false,"resource_name":"","script":{"__gdtype":"Object","properties":{"resource_local_to_scene":false,"resource_name":"","script/source":"class_name MyClass extends Resource\n\t\n@export var serialize_this: String = \"test\"\n"},"type":"GDScript"},"serialize_this":"test"},"type":"Resource"}
print(json_with_script, "\n")
# Below throws: Parser Error: Class "MyClass" hides a global script class.
var deserialized_with_script: Variant = JSON.to_native(JSON.parse_string(json_with_script), true, true)

Minimal reproduction project (MRP)

n/a see steps to reproduce

Gnumaru commented 2 months ago

I'm providing a MRP for convenience

JsonDesserializeScriptBug.zip

Note that, when allowing scripts (the third parameter in JSON.from_native) the generated dictionary contains the script source itself, wich does not seem correct at all for non embeded scripts (in the MPR I have used a separate script file, not an embedded script).

The error "MyClass hides a global script class" probably happens because when desserializing with JSON.to_native the newly created script, wich is not related to the previously existing script at all, tries to "steal" the class_name from it by declaring an already existing class_name.

I don't know what is the intended behavior when serializing/deserializing scripts (if there is one), but at least when desserializing by the same application that serialized the object, I think that the user expected behavior is to have the desserialized object to reference the previously existing script, and not for a new script, not tied to the previous one, to be created. Maybe there should be a special case for scripts where besides serializing the object itself, a reference to the resource path should be added so that, when unserializing in another application, it would create a new script, but when serializing in an environment where the resource path exists and is a script, the serialized script data is discarded (except for exported variables) and the existing resource in the refered resource path is used. In the example bellow notice the added resource_path property referencing res://MyClass.gd

{
  "__gdtype": "Object",
  "properties": {
    "resource_local_to_scene": false,
    "resource_name": "",
    "script": {
      "__gdtype": "Object",
      "properties": {
        "resource_local_to_scene": false,
        "resource_name": "",
        "script/source": "class_name MyClass extends Resource\n\t\n@export var serialize_this: String = \"test\"\n"
      },
      "type": "GDScript",
      "resource_path": "res://MyClass.gd"
    },
    "serialize_this": "test"
  },
  "type": "Resource"
}

fire commented 2 months ago

I didn't want to hard code special processing since I'm fetching the property as if it was a string and putting it into the json.

I feel like special processing would be a separate system? maybe a callback or a better design.

However, this is bad in the sense that the surprising behavior of serializing an object to string has an unexpected result and there's no natural way of fixing it.

We could remove class_name though, but I'm not an expert at gdscript internals.

neth392 commented 2 months ago

For the script property I think the security issue definitely warrants a special case to omit it.

I faced this issue in my project and thought on it way too long. Not sure if it helps, but here's a breakdown of what I went through

The problems:

When developers serialize to JSON, they don't want all properties included. script being one of them. There are many others such as process_mode which don't need to be stored in game save files. 99% of the time you only need to save data relevant to player's game save. Not the configuration of your scene tree; that's done in the editor by the developer. So how do we determine which properties are to be serialized?
Detecting what class a serialized object is. Is it native? Is it based on a script?
After the above is solved, how do we determine how to create an instance of this object? What if it's based on a PackedScene and not just a script?
Another consideration, what if this is a dynamic Array or Dictionary containing objects? Such as an array of "Enemy" nodes that all extend from "Enemy" but are their own classes. That eliminates the possibility of a user passing an instance of an object into JSON.to_native() to deserialize into that object.
Property names, class names, and script paths all change throughout the development cycle. What's the best way to help a developer preserve their saves without too much of a headache?

Below is what I did to tackle this issue. Not saying it is perfect to build into the engine but could maybe spark up some ideas.

My solution:

A global instance that manages "json specification" resources on how to serialize a specific class. It contains an id which the developer sets & does not change to preserve saved data.

Next the developer sets what class this specification is for. Can be set by script or by inputting the class name (for native object support). Then they set how to construct instances of the object. I used a "json instantiator" resource which I extended into a few implementations; Native (constructs instance using ClassDB & class name), Script (constructs instance from a gdscript), and PackedScene (constructs instance from a scene file). This feature prevents class name & script path changes from breaking saves.

Finally, the user selects which properties are to be serialized, each property is its own resource within the object's specification resource. I made it easy using PROPERTY_HINT_ENUM_SUGGESTION and setting that hint as a list of all of the properties of the class & its parents. For the actual engine this would be better off using the property selector UI (like MultiplayerSynchronizer has). Also, each property gets its own "json key" string that is used to store the property in JSON, this protects against the property name changing as long as the dev updates the property resource after changing the property name.

Now any supported object can be serialized whether it's a standalone instance, a property of another object, or in an array/dictionary.

The only downside is that it's a bit of legwork to configure it for all of your objects. But the benefits of no boilerplate code, nice clean save files w/o excess properties, and seamless Object -> JSON -> Object functionality are well worth the trade off.

fire commented 2 months ago

~~Why would a local instance that manages "json specification" resources not work?~~

You could pass in parameter array that contains all the instructions on how to serialize a specific class to the JSON to native methods.

Thoughts?

neth392 commented 2 months ago

That could work, but now what if that class has properties that are instances of other classes? And those other classes have other object properties... etc

The nesting would get a bit deep there for larger objects, which is why I used a global "specification manager" instance; define the specification once for a class and done. A specification manager like this wouldn't have to be a global instance if built into the engine, it could just be a node which any dev could add to their autoloads if needed. Maybe that node could be passed as the parameter to JSON.to_native so that objects w/ properties that are other objects can all derive how to be serialized from that specification manager node.

fire commented 2 months ago

I don't know if a node would be correct.

The defining difference between RefCounted objects and Node objects is that Nodes have _process and _physics process (internal versions too).

Does your manager need ticking?

neth392 commented 2 months ago

No not all, it is set & forget (until you need it). Can always be a Resource/RefCounted, I just suggested a node so its easier to make it globally accessible which I think would usually be the case. But easy enough to do that with a resource or refcounted as well.

Edit: If it'd be similar to the system I made then it would have to be a Resource so that it could be configured in the editor. A RefCounted would require every specification be registered/added at runtime for every game launch, not the worst possibility though

fire commented 2 months ago

Do you think you're able to make a pull request implemnting this? I can also port gdscript versions of this.

I think keep it local to a Resource is the best approach for now. Like manually pass a resource to the JSON to native from native.

neth392 commented 2 months ago

Possibly

I don't know C++, just enough to (kind of) read it, and don't have the time to learn it so it'd have to be in gdscript. I am also a bit clueless on where I'd put these gdscript files & how this whole process would work with me writing it in gdscript in a fork, unless you want me to just write this system separately & not in a forked repo

fire commented 2 months ago

Write this system separately if that's easiest.

fire commented 2 months ago

@neth392

Want me to just write this system separately & not in a forked repo

Yeah. Then once we've narrowed the design we can port to c++.

godotengine / godot