godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.16k stars 97 forks source link

Add compact and customizable complex Struct types to Variant with a strong, but lightweight type system #2816

Closed willnationsdev closed 1 year ago

willnationsdev commented 3 years ago

Describe the project you are working on

A simulation game with many small custom data structures for web/mobile.

Describe the problem or limitation you are having in your project

With script code, you either have only Variant-compatible stuff or have access to non-Variant data structures (C++/C#) that must eventually map to Variant anyway (exporting variables, serialization, etc.). All of the user-friendly options in the scripting API have poor control over allocation size for complex data types. For example, a NamedPoint type:

class NamedPoint {
    float x;
    float y;
    StringName name;
};

Assuming Variant is 20 bytes (4 type enum, 16 data), representing a NamedPoint value has the following options:

Ideally, we want to get as close to 20 bytes as possible with automatic serialization, type safety, editor integration, and named labels for members.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

  1. Add a TYPE_STRUCT state to Variant.
  2. Reduce byte allocation of Variant's Type type member from 4 to 2.
    • Not sure if this is even possible/realistic, but I don't think it needs all that memory.
  3. Add to Variant a uint16_t struct_type_id member to fill in the lost 2 bytes.
  4. Define a StructServer singleton that maintains a config-based array of StructTypeInfo objects.
    • struct_type_id is an index to fetch a specific StructTypeInfo.
    • StructTypeInfo can be defined multiple ways:
      • ProjectSettings editor tab.
      • A global script class that flags itself as a struct (scanned by EditorFileSystem. Could be incorporated into ScriptLanguage::get_global_class_name(path)). For example, a GDScript with the @struct class annotation. A compatible script would be subject to limitations:
        • Only 1, 2, or 3/4 properties. Godot allocates a preset max byte size to each of them: 16, 8, or 4 bytes respectively. If 3 properties, the last 4 bytes are ignored.
        • These scripts are purely used as configuration, so all details that define/configure ScriptInstance data is either ignored or explicitly reported as an error for better feedback (since it's unnecessary).
        • Cannot instantiate script.
        • Cannot participate in intra-language script inheritance.
        • Inherited class is ignored.
        • Property definitions are read directly from the Script with Script.get_script_property_list(), so instance properties from _get_property_list() have no meaning.
        • No signals.
        • Constants are allowed and should be accessible to scripting languages from references to the Struct's type name.
        • Can only define static methods.
          • By convention, a language implementation should forbid executing a static method from a Struct Script reference directly. Instead, the Variant call API that handles a Struct will access them.
          • The first parameter, if any, is prefilled with a Struct. These methods act as extension methods for instances of the Struct Variant.
          • Magic static extension methods allow for structs to implement operator overloading. Or perhaps they are virtual methods on a faux, uninstantiable Struct class that all struct scripts must extend? I'm not too keen on the details for cross-language operator overloading or whether that's even desirable.
    • Struct config works like script classes. If the "config" is updated (ProjectSettings save, script file modified), scripts are re-parsed, their data is synced to ProjectSettings, and the StructServer rebuilds from the result, affecting all subsequent evaluated type-checking logic.

All of the above script limitations would have to be separately implemented and maintained by each respective scripting language that has added support for defining struct types. That is, the parsing/compiling logic of each language would need custom logic for verifying that the parsed/compiled class definition abides by the policies.

You would also separately need to teach each language and the editor how to handle Struct Variants.

By default, the assignment operator would copy-by-value, creating a new Variant with the same _mem and type/struct_type_id. Other operators would result in an error until a legitimate means of overloading them is devised.

In practice, Structs would allow script users to define small complex data types that are created with much smaller memory footprints (exactly 20 bytes) and without sacrificing named member lookup, type safety, or automatic editor integration.

You'd get full type safety in GDScript/VisualScript, and configuration-based "safety" with runtime checks and casting in compiled static languages like C++ and C#. We'd need to expose a StructServer API of some sort to provide a high-level API for handling the actual type/byte-segment lookup operations that GDScript and VisualScript could handle for users out-of-the-box.

In addition, a Variant-compatible lightweight Struct type could fix and improve many other things.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Tentative sample of engine changes:

class Variant {
public:
    // Add new type
    enum Type {
        // ...
        TYPE_STRUCT,

        VARIANT_MAX
    }

    // ...

    // Split up Variant type information within same region.
    Type type : 2;
    uint16_t struct_type_id : 2;

    // ...
};

// Information configured for a given struct type.
struct StructTypeInfo {
    StringName name;
    // Members, split into 4-byte increments.
    // Optimize property name lookup? Not sure of best way.
    PropertyInfo members[4];
    // For example, NamedPoint...
    // _mem[0-3], TYPE_REAL, "x"
    // _mem[4-7], TYPE_REAL, "y"
    // _mem[8-11], TYPE_STRINGNAME, "name"
    // _mem[12-15], TYPE_NIL, ""

    // Need to store methods from Script and/or engine C++ bindings
    Vector<MethodInfo> methods;
    HashMap<StringName, Callable> method_callables;

    // Need to store constants associated with typename
    HashMap<StringName, Variant> constants;
};

// This is essentially a lightweight ClassDB just for structs.
class StructServer {
    // Variant::struct_type_id indexes into this.
    Vector<StructTypeInfo> type_data;
    // TypeName -> StructTypeInfo index reverse lookup table.
    HashMap<StringName, int> type_name_map;
};

Also, expose the StructServer singleton to provide a high-level interface for working with Struct Variants. Would be necessary to use in engine C++ and optional in C#, both with string-based lookups.

// C++
StructServer *ss = StructServer::get_singleton();
// varray is constructor params, filling in each byte segment in order.
Variant point = ss->struct_new("NamedPoint", varray()); // (0.0, 0.0, StringName())
Variant point2 = ss->struct_new("NamedPoint", varray(5.0, 10.0, "home"));
Error valid;
Variant x = ss->struct_get(point2, "x", &valid); // returns 5.0
ss->struct_set(point, "x", 5.0, &valid); // mutates "x" byte segment in point's `_mem`.
// C#
object point = Godot.StructServer.StructNew("NamedPoint"); // (0.0, 0.0, "")
object point2 = Godot.StructServer.StructNew("NamedPoint",
    new[] { (object)5.0, (object)10.0, (object)new StringName("home") });
// directly with StructServer
bool valid = Godot.StructServer.StructGet(point2, "x", out object x);
float f = valid ? (float)x : 0.0f; // should be "safe" based on configuration.
bool valid2 = Godot.StructServer.StructSet(point2, "y", 5.0);
// maybe via Variant logic or extension methods?)...
float x2 = point2.Get<float>("x"); // casts, throws exception if not exist.
bool valid3 = point2.Set("y", 5.0);

Usage of a Struct type in GDScript. The language automatically handles conversions:

var p = NamedPoint(1.0, 2.0, "home")
# 1. Check against built-in types. Check failed.
# 2. Check against configured Struct types. Hit. Grab type index.
# 3. StructServer creates TYPE_STRUCT Variant with assigned `struct_type_id`.
#    - Constructor is varargs, assigns each Variant to members in-order, 
#      truncated by byte limit...
#    - If StructTypeInfo has fewer non-TYPE_NIL PropertyInfos than provided
#      constructor arguments, report an error. If more than, then prefill up
#      to correct amount with default values.

var x2 = p.x
# 1. p is a Struct (TYPE_STRUCT).
# 2. Call StructServer::get_singleton()->struct_get(p, "x") to get Variant.
#     1. Internally, use `struct_type_id` in p to find StructTypeInfo.
#     2. Find index of "x" PropertyInfo.
#     3. Determine byte distribution based on number of non-TYPE_NIL properties.
#         - This should be cacheable per type somehow.
#     4. Combine index with byte distribution (16/8/4) to extract byte subarray.
#     5. Convert bytes to Variant and return it.
# 3. If `struct_type_id` is invalid or property "x" is not found, report an error.
# 4. Return extracted Variant from expression and assign to x2.

Languages that support struct definitions could look like this:

(if, say, _op_add were a magic method for variant op overloading)

Example in GDScript:

@struct
@class_name NamedPoint

const TYPE_NAME := @"NamedPoint"

# 3 properties defined, so three 4-byte segments.
@export var x: float
@export var y: float
@export var name: StringName

static func _to_string(p_value):
    return "%s is at (%f, %f)." % [p_value.name, p_value.x, p_value.y]

static func latlng(p_value):
    return LatLng(p_value.x, p_value.y) # return some other Struct

static func _op_add(a, b):
    return NamedPoint(a.x + b.x, a.y + b.y, a.name)

# usage
print(NamedPoint.TYPE_NAME) # can access constants directly from type
var p = NamedPoint()
print(p) # calls _to_string() method with argument `p`
print(p.latlng()) # prints LatLng with same x/y coordinates.
var p2 = p + p
print(p2.x == p.x + p.x) # true

Example in C#:

// NamedPoint.cs
using System;
using Godot;

// Mark as a "script class", but it's actually a struct.
[Global]
public struct NamedPoint
{
    public readonly StringName TYPE_NAME = new StringName(nameof(NamedPoint))

    [Export] public float X { get; set; }
    [Export] public float Y { get; set; }
    [Export] public StringName Name { get; set; }

    public static _ToString(NamedPoint value)
    {
        return $"{value.Name} is at ({value.X},{value.Y}).";
    }

    public static LatLng(NamedPoint value)
    {
        return new LatLng { X = value.X, Y = value.Y };
    }

    public static _OpAdd(NamedPoint a, NamedPoint b)
    {
        return new NamedPoint
        {
            X = a.X + b.X,
            Y = a.Y + b.Y,
            Name = a.Name
        };
    }
}

// Usage
GD.Print(NamedPoint.TYPE_NAME);
var p = new NamedPoint();

// Not sure if possible, or if you'd have to somehow hack
// the use of actual C# extension methods.
GD.Print(p); // calls _ToString() method with argument `p`?
GD.Print(p.LatLng()); // prints LatLng with same x/y coordinates.

// Not sure if C# would respect a Variant-specific operator overload
// or if you'd separately need some `operator +(NamedPoint other)` method.
// Only the Variant-specific ones would work in other languages' contexts, unless
// the Mono bindings handled mapping C# struct op overloads to Variant ones for that type of struct.
var p2 = p + p;
GD.Print(p2.X == p.X + p.X); // true

If this enhancement will not be used often, can it be worked around with a few lines of script?

Web and mobile are memory-sensitive platforms by definition, so improving options available for memory conservation opens more options for what can be done with vanilla, script-only Godot projects. Based on mentioned examples, scripted workarounds are either too verbose and inefficient or easy-to-use, but really inefficient.

Is there a reason why this should be core and not an add-on in the asset library?

The memory enhancements require revisions to what Godot Engine internally executes when performing Variant serialization/initialization. Therefore, the customization features must be made at the Variant level which is a core feature. If things were done with an asset, it would defeat the purpose; the enhancements would only exist for a specific language and would be unavailable to other languages like GDScript and VisualScript, nor would they be available to the Editor.

willnationsdev commented 3 years ago

I did realize a few potential issues with this proposal, but they don't necessarily seem insurmountable.

  1. Because structs are "copy-by-value" complex types, you can't store a reference to a Struct as its own "property". This, in turn, means that Structs can't really store other Structs within themselves. Or rather, if you tried to force it by having two 8-byte Structs within one Struct, each of the sub-Structs would spend 4 of their bytes just storing their type information and would only have room for one actual property. The rest of their data would be truncated by the wrapping Struct's logic. Therefore, allowing Structs within Structs is simply wasteful or worse confusing/misleading to end-users.

  2. Because this implementation of Structs is so picky about how many actual bytes are allocated for specific properties, we will need to improve documentation (and perhaps give some sort of feedback indicator in the UI?) to more clearly indicate to users how many bytes of data are actually consumed by a given value or reference. For example, I would assume that String, StringName, Object*, Array, Dictionary, and the Packed*Arrays are each a single pointer to data that exists somewhere else. And then those pointers would be 4 or 8 bytes for 32 and 64-bit devices respectively, depending on how that Godot build was compiled. When is it safe to use 2 or 4 of such properties in a Struct?

willnationsdev commented 3 years ago

I thought of a potential workaround for these issues, but it comes with its own significant drawbacks, so feedback on improvements (or whether it is even worth it) would be appreciated:

Make a Struct Variant a proxy container with an ID that maps to backend Vector<T> storage in the core for each kind of built-in type.

  1. Just like before, TYPE_STRUCT with struct_type_id for NamedPoint. But now, the "value" of the Variant is just a uint64_t ID number for the Struct instance.
  2. NamedPoint configuration relies on the same concept of specifying PropertyInfos for members, but now the byte segmentation isn't significant as the full data for the Struct doesn't need to be kept in the Variant itself. As a result, you could potentially have an arbitrary number of properties within a Struct (though we may still need to impose a specific limit for performance reasons).
  3. When asking for property x on a NamedPoint Struct, the StructServer would...
    1. identify x as the Nth index of NamedPoint's member array (in the example case, N = 0).
    2. identify that the PropertyInfo in question has TYPE_REAL.
    3. plug the Struct instance ID into a data structure (Vector? HashMap?) that maps it to a subarray of indexes, one for each possible PropertyInfo a StructTypeInfo supports. So, if structs can only have up to 4 properties, and so a StructTypeInfo has a PropertyInfo[4] like in the initial example, then the "value" of the Struct ID lookup operation would be a uint64_t[4]. This subarray would be secondary indexes to corresponding backend arrays.
    4. Since x had TYPE_REAL and was in the 0th slot of the StructTypeInfo's member list, you'd take the 0th secondary index and plug it into the PackedRealArray backend storage to fetch the value.

Then, when creating any new instance of a Struct and based on the data types of the Struct type's members, Godot would need to identify and assign free slots in the backend arrays to the new Struct ID. Furthermore, we might need to provide an interface through the StructServer to allow users to declare plans to allocate a particular number of Struct types, thereby allowing the StructServer to resize corresponding backend arrays accordingly to prevent constant array growth during mass allocations.

You could then still preserve all other aspects of the Struct Variant. Copy-by-value assignment operations can be done purely with backend array overwrites without users needing to be aware of it. Scripts could still do struct type configuration. Though I'm no expert, I think File IO and network communication would probably need to deserialize the Variant data to larger-than-Variant-sized byte streams with a copy of the backend array data along with the instance ID and type info.

Pros:

  1. Structs are no longer limited by the number of bytes available to an individual Variant. Therefore...

    1. Storing a "Struct within a Struct" is just a matter of storing one integer in one of the Struct properties (solves issue 1).
    2. Knowing how much memory is associated with a particular data type is no longer relevant since each Struct is a proxy (solves issue 2 above) for data that exists outside the Variant.
  2. Successfully maintains most of the general requirements...

    1. Configurable struct data types.
    2. Configurable named and typed members associated with struct data types.
    3. Drastic reduction in memory consumption compared to the equivalent Object use-case.
    4. Automatic editor integration.
    5. improved capacity to expose miscellaneous core types and integrate third-party language struct types.

Cons:

  1. Much larger data consumption per Struct instance. Now, since the data is stored in backend arrays, you need a mapping table of secondary indexes to find the data associated with each member. This results in an extra sizeof(uint64_t) byte allocation for each total number of member properties P in a StructTypeInfo. You also have to actually maintain allocated memory for the actual data separate from the Variant. If you kept a max of 4 properties as the initial proposal gave, you'd end up with something like...

    • Variant: 20 bytes (Type, struct type ID, struct instance ID)
    • Secondary Index Lookup Table: 32 bytes (8 bytes for array index, one for each of 4 properties in StructTypeInfo)
    • Actual byte allocation of values: Type-dependent. E.g. NamedPoint would be 8 bytes (2 floats) + 8 bytes (StringName pointer).

    So, this comment's changes would shift a NamedPoint from 20 bytes to 68 bytes (a very significant loss in efficiency). However, this is still more performant than the Array (80 bytes), Dictionary (160 bytes) and Object (~800 bytes) options we had available before, and the increase in usage flexibility might be worth the cost. Further discussion/investigation is required.

  2. It decreases the performance of all symbol lookup operations for a Struct. The O(n) behavior of these operations also multiplies the O(n) of Struct creation linearly since new slots must be identified and allocated for each possible property.

    • Before: Struct -> Struct type -> Property -> Data
    • After: Struct -> Struct type -> Property -> Secondary Index Map -> Backend Array -> Data
  3. Significantly increases the complexity and maintenance burden associated with the Struct Variant concept.

    1. You have to build and maintain many different arrays of data that are shared between different data types (could possibly be optimized better somehow to account for each type's frequency of use?).
    2. You have to track the creation and destruction of Struct Variant instances so that you can know when slots in a backend array become free/available. Might result in a need for reference-counting the proxy IDs, similar to References?
    3. You have to deal with differences between the actual Variant and the backend array data during serialization/IO processes.
SpectralDragon commented 3 years ago

I wanna trying to implement this feature. I thinking about structs in Godot all the time

willnationsdev commented 3 years ago

@SpectralDragon Heh, well, if you're going to spend time on it, I would suggest holding off on data representation as much as possible till discussion can come to some agreement on the best way to implement it. Or, if you don't wanna wait, start off with the simpler original proposal.

lyuma commented 3 years ago

Ooh, just saw this proposal. I've had a problem where I need some primitive type which must be serializable but cannot be a Resource...

What I've been doing is using an untyped Array for lightweight "struct-type" objects. To avoid mixing these up with normal arrays, I use first element of null, followed by data.

For example, one of my struct-like types which represents a tuple of a string and two numbers might be [null, "object_name", 123, 4] . The type check would be something like len(arr) == 4 and typeof(arr[0]) == TYPE_NIL and typeof(arr[1]) == TYPE_STRING.

The workaround I picked seems to get me by for now, though I'd really be happy to see another way to do this.

hilfazer commented 2 years ago

I made a program for measuring memory consumption of various Godot types. It seems relevant for this issue. I hope it'd be useful. Link: https://downgit.github.io/#/home?url=https://github.com/hilfazer/Projects/tree/master/Godot/MemoryConsumption

It uses values from Performance singleton. Keep in mind the memory allocated by the OS will be higher.

willnationsdev commented 2 years ago

I was reviewing Variant briefly earlier today when I came across the Pools struct which contains BucketSmall, BucketMedium, and BucketLarge, each of which store unions of increasing size and is managed by static PagedAllocator<T> instances. If I'm reading it right, these Pools are the means by which Variant is implemented to support managing memory for data structures that are larger than the bytes available solely within the Variant itself.

This means there are 4 memory cascades associated with Variant storage:

These buckets are then used in the Variant::reference(...) and Variant::_clear_internal() methods plus the Variant(...) custom constructors to handle the allocating and freeing of memory related to the "bucketed" types.

There are then a myriad of templated types and functions responsible for defining Variant-to-Variant logic in variant_construct.cpp, variant_destruct.cpp, variant_op.cpp, and likely many more places. Of particular note are places like variant_setget.cpp and variant_call.cpp where macros are leveraged to mass-produce actual symbols that define specific structs or functions responsible for defining/accessing properties or defining/executing methods of the Variant's stored subtype.

As far as I can tell, in order to make the proposal work as intended, you would have to take these sections that revolve entirely around creating large collections of statically defined relationships, and introduce a Script-powered, dynamic dependency by which to have scripts intrude on this low-level defining layer to define their own types, set/get properties, and methods (via the Script's static extension methods concept). Naturally, it's impossible to evaluate a Script without first having already loaded all of this data regarding Variant data/logic.

To make it possible, you'd need a subset of the available definitions/logic to be something that can be dynamically populated, cleared, and refreshed as needed. I'm not yet 100% sure whether that's even conceivable without restarting the engine runtime (my gut says "no"), in which case, I feel like this would grow into a sort of GDExtension-like feature at the "core" API layer where any time you make changes to a struct definition (its name, properties, constants, and/or "extension methods"), you would have to restart the engine or editor in order to see those changes reflected in Variant logic.

You would need to have all of the struct definition info saved to a file (for which the definition could be populated from anything, including a GDScript resource as proposed). Then, when initializing the Variant runtime, it would deserialize that info and use it for a data-driven subset of the Variant setup process. Then, because everything else is built on top of that, the only way to make updates would be to re-serialize the struct spec and reload the engine/editor to trigger the same Variant runtime initialization logic that pulls from that data.

My thinking is that the proposed StructServer would analyze the definition of the struct from a given script and then identify which bucket level (if any) would be necessary for storing the memory for that object. And then users would just need to be aware of what the memory thresholds were, what the maximum amount of storage on a Variant is (96 or 128 bytes), etc. This gives you the best of both worlds from the two different versions I was thinking of since the engine already internally implements "version 2" on its own and even includes varying bucket sizes, including no bucket at all. We just need to have a way of dynamically populating the data from GDExtension and/or Script definitions.

If anyone spots any problems in my line of thinking, please let me know.

azur-wolve commented 2 years ago

As far as I can tell, in order to make the proposal work as intended, you would have to take these sections that revolve entirely around creating large collections of statically defined relationships, and introduce a Script-powered, dynamic dependency by which to have scripts intrude on this low-level defining layer to define their own types, set/get properties, and methods (via the Script's static extension methods concept). Naturally, it's impossible to evaluate a Script without first having already loaded all of this data regarding Variant data/logic.

To make it possible, you'd need a subset of the available definitions/logic to be something that can be dynamically populated, cleared, and refreshed as needed. I'm not yet 100% sure whether that's even conceivable without restarting the engine runtime (my gut says "no"), in which case, I feel like this would grow into a sort of GDExtension-like feature at the "core" API layer where any time you make changes to a struct definition (its name, properties, constants, and/or "extension methods"), you would have to restart the engine or editor in order to see those changes reflected in Variant logic.

Then if I've understood correctly, the only way to have structs properly well implemented would be to re-design the whole Variant logic, which is a big and core aspect of the engine, which in any case wouldn't happen before Godot 5.x?

What about letting the user specify the size tier for the struct? (instead of do it automatically; it is less user-friendly but implementation could be more straightforward [?])

Would need an accessible way to know how many bytes is taking a struct by hovering the mouse over the struct or by using the profiler (for both in-editor and run-time), to know if must upgrade a struct to the next size tier; could use compiler error/warning messages for boundary overflows.

struct <STRUCT_BODY> (without bucket)

struct small OR struct2

struct medium OR struct3

struct large OR struct4

or by using @annotations

May have misunderstood though.

willnationsdev commented 2 years ago

@KiwwiPity

Then if I've understood correctly, the only way to have structs properly well implemented would be to re-design the whole Variant logic, which is a big and core aspect of the engine, which in any case wouldn't happen before Godot 5.x?

I can't say for certain as I still need to analyze it all in more depth, but I don't think the changes would involve some sort of drastic re-design of the entirety of the Variant logic itself. In the grand scheme of things, you'd just be adding a single new type to the set of types that a variant could be, and thus you'd be creating whatever other type-to-type operators and constructors are necessary to teach the Variant how to work with the new type (no different from adding any other type to Variant). It's just that the internal handling of the logic for those operators, constructors, and the like would need to be data-driven rather than statically hardcoded in C++. We'd need to branch their logic based on data we load ahead of time from the filesystem just like GDExtensions in the core API layer.

And as far as I can tell, doing so wouldn't necessarily involve breaking any compatibility with earlier versions of Godot, so I do not foresee it requiring a Godot 5.x major version upgrade to be possible to implement (could be wrong though).

What about letting the user specify the size tier for the struct? (instead of do it automatically; it is less user-friendly but implementation could be more straightforward [?])

Would it be more straightforward? Probably. Would it make any difference in the constraints/limitations from an engine perspective? No. Therefore, I would still opt to have the engine just automate that process by default. Might expose a means for power users to optionally give byte size suggestions to the engine though.

Would need an accessible way to know how many bytes is taking a struct by hovering the mouse over the struct or by using the profiler (for both in-editor and run-time), to know if must upgrade a struct to the next size tier; could use compiler error/warning messages for boundary overflows.

Agreed. The auto-calculation/estimation of byte size would need to be something that happens at design-time, thereby allowing script editors to report compile/parse errors to the user when the structs they are defining would exceed the engine's available byte limitations.

willnationsdev commented 1 year ago

I started prototyping this proposal with a StructDB & Struct pair of types that mimic the definitions of ClassDB and Object a little, but with various things stripped out and struct-specific things added in. In the process, a few things have come to my attention.

First, you can't merely reuse the Variant type's bits to save space in defining type information for structs; doing so would make a standalone struct (outside of a Variant wrapper) be unable to report what type of struct its data contains. So scrap that idea. We'll just have to sacrifice a portion of the 16 bytes in a minimal struct to store metadata about the struct instance. That can then include not only the type ID, but also the "bucket" size of the struct instance (so things don't have to query the centralized StructDB just to know the size of its data).

Second, in order to properly recreate Object-like behavior for structs (defining structs both in engine code, in extensions, and in scripts with method bindings), information about a struct's type, property names, and property types HAVE to be stored alongside the property values themselves otherwise we run the risk of data corruption upon deserialization if the struct's type definition has changed (e.g. user swaps variable names / reorders properties). Given that, I'm considering that we could have one serialization strategy that efficiently compacts the struct data for transit with just the struct type and property values (for use at runtime) and another with much more detail that is used specifically when storing struct data in long-term storage like a file or database. Not sure yet if there are any problems with that approach.

Other than that, I don't have anything to show off. Progress is slow-going due to time constraints on my end. Just wanted to give an update here for folks interested in this proposal.

azur-wolve commented 1 year ago

@willnationsdev this is supposed to be a PR for Godot 4.x?

willnationsdev commented 1 year ago

@KiwwiPity This proposal is a description of how a potential pull request for Godot 4.x might be written, yes. The prototype I mentioned above is my attempt to see if I can implement this feature in a barebones format. An MVP would likely entail being able to define a PropertyInfo struct accessed with get_property_list() and then implement exposure of engine-defined structs in GDScript. If that works, then I may try to get that merged as a pull request. After that would be adding support for script-defined structs and then adding GDScript support for defining structs.

azur-wolve commented 1 year ago

@KiwwiPity This proposal is a description of how a potential pull request for Godot 4.x might be written, yes. The prototype I mentioned above is my attempt to see if I can implement this feature in a barebones format. An MVP would likely entail being able to define a PropertyInfo struct accessed with get_property_list() and then implement exposure of engine-defined structs in GDScript. If that works, then I may try to get that merged as a pull request. After that would be adding support for script-defined structs and then adding GDScript support for defining structs.

About the stucts, would be great to have them with GDScript. Maybe the devs would opt the way to just use inner classes though [class on GDScript.]

RedworkDE commented 1 year ago
  1. ClassDB already has a concept of structs (cf GDREGISTER_NATIVE_STRUCT), tho it is really limited: They only have a name, size and c code that defines the contents.
  2. The ptrcall convention for calling ClassDB methods already requires the engine and all GDExtension languages to be able to read and write raw variant contents from/to a void* given some metadata in the form of a PropertyInfo and a GodotTypeInfo::Metadata

Based on those points I would extend ClassDB::register_native_struct to add a list of name, offset, PropertyInfo and Metadata for each member of the struct, and thus allow access to the contents without having to involve a a c compiler to determine the actual layout of the struct. Then based on those enhanced native structs I would then build struct variants, by adding a variant type STRUCT and then use the padding between type and _data to store the struct type as an opaque id. The actual contents of the struct are stored either inline in the _data field or on the heap depending on the struct size. For performance reasons it may be desirable to encode if the struct is inline and maybe even the size in the id directly or by using a second variant type. Personally (at least for the first version) I would not allow defining methods on structs, and just keep them as dumb, but efficient data containers.

The biggest issue with this is probably going to be dealing with user (script) defined structs and modifications to them. There I would probably just end up always defining a new struct type instead of changing the layout of existing types and then doing some editor magic to convert instances of the old type, or just instances of these structs to exist in the editor at all and thus not have to deal with the changing layout at all. For serialization I would use a scheme similar to objects and serialize them as key value pairs to avoid these layout issues. Finally at runtime of a game I would disallow defining and changing struct types entirely, since that is probably not going to be useful very often and the overhead to make it work would just no be worth it. EDIT: As a first version defining custom structs in project settings and requiring an editor restart to apply changes would probably get rid of most of the complications around user defined structs (except for layout, but that is really the smallest issues here)

willnationsdev commented 1 year ago

@RedworkDE Thank you for this information. I'll have to take an in-depth look at that as it may render a lot of the stuff I was starting to do unnecessary. I appreciate it.

nlupugla commented 1 year ago

Just came across this proposal and I wanted to voice my support for it :)

I don't follow all the implementation details, but I'll add that it would be great if the struct could be edited easily from the editor inspector, the same way that a Resource can be. In my projects, I like have a layer of separation between my data and my behavior, but Resources allow for behavior, which can be potentially dangerous to deserialize. Having a data structure that is limited to just data would be great!

nlupugla commented 1 year ago

Hi @willnationsdev ! I might be interested in helping implement this feature if you want any help coding, testing, or theory crafting. Feel free to reach out on here or RocketChat if you want to get in touch :)

willnationsdev commented 1 year ago

@nlupugla That would be a welcome effort if you are up for it. I have a structs branch on my godot fork where all the work has been done, but it's very much a WIP/incremental patchwork job thus far. I haven't had time to work on it for several months now due to a large task with a hard deadline at my day job, but you are welcome to review and/or build off of what I've done thus far.

In short, I made a base struct class with 5 derived versions that each account for multiple "tiers" of allocating memory for a single struct instance: StructMinimal (uses no memory beyond the Variant), StructSmall/StructMedium/StructLarge (uses the PagedAllocator buckets just like other Variant sub-types), and iirc a StructCustom (for users that want to write their own implementation). The plan was to then gradually duplicate the existing Object-related logic in Variant and the related classes across the core so that they treat each struct instance in more or less the same way with the sole exception being the way memory is initially allocated and cleaned up.

However, it's been ages since I've worked on it and I'm pretty sure a lot of the changes thus far are either obsolete or work-in-progress things that need to be cleaned up.

nlupugla commented 1 year ago

Great! I'll take a look through your branch @willnationsdev when I get the chance.

I read through the discussion above a bit more carefully, and I have to say, I like where @RedworkDE is coming from.

I don't like the idea of having to make an entire StructDB that lives along side the ClassDB. I get the desire for having the structs be as memory-efficient as possible, but I think code complexity is super important as well. The more we can build on what's already there, the better.

I very much agree that structs should not allow user-defined methods. I think they should be as close to Plain Old Data as possible. I could see an argument for allowing operator overloading and/or static methods, but I think it's very important to be able to deserialize structs without the risk of arbitrary code execution, which is one of the pitfalls of Object.

The one thing I disagree on here is having to go through project settings to define structs. It would be sweet if structs could be anonymous and be scripted inline. For example,

class_name PointGenerator extends Node

var points_created : int = 0

func _ready() -> void:
    for i in range(4):
        var point : (name : String, x : float, y : float) = generate_random_named_point()
        print(point.name, point.x, point.y)

func generate_random_named_point() -> (name : String, x : float, y : float):
    points_created += 1
    name = "point" + str(points_created)
    x = randf()
    y = randf()
    return (name : name, x : x, y : y)

That said, I see this as more of a "nice to have" than a "need to have".

Calinou commented 1 year ago

Closing in favor of https://github.com/godotengine/godot-proposals/issues/7329. Thanks for the original discussion nonetheless :slightly_smiling_face: