BattlehubCode / RTE_Docs

This is a repository for Runtime Editor documentation, discussion and tracking features and issues.
https://assetstore.unity.com/packages/tools/modeling/runtime-editor-64806
11 stars 0 forks source link

[AssetDatabase] .proto schemes and protobuf.js #111

Open BattlehubCode opened 3 months ago

BattlehubCode commented 3 months ago

Hello, thanks for the awesome package. Is there a way to get the .proto schemes for the editor's serialized files? We would like to be able to read the files in JavaScript.

BattlehubCode commented 3 months ago

This is a good question. I created sample code that gets meta-types from a typemodel and uses typeModel.GetSchema to get the schema.

For some reason GetSchema throws exceptions when the class has enum fields, to solve this problem I added enums to the typemodel manually. I'm not sure if that makes the schema incompatible, though.

using Battlehub.Storage;
using ProtoBuf.Meta;
using System;
using UnityEngine;

public class GetSurrogatesSchema 
{
    // Start is called before the first frame update
    public void PrintSchema()
    {
#if UNITY_EDITOR
        var typeModel = SerializerBase<Guid, string>.RuntimeTypeModel;

        typeModel.Add(typeof(UnityEngine.UI.Navigation.Mode), false);
        typeModel.Add(typeof(UnityEngine.Events.UnityEventCallState), false);
        typeModel.Add(typeof(UnityEngine.Events.PersistentListenerMode), false);
        typeModel.Add(typeof(UnityEngine.MeshTopology), false);

        var types = typeModel.GetTypes();
        foreach (MetaType metaType in types)
        {
            var fields = metaType.GetFields();
            foreach (var field in fields)
            {
                if (field.MemberType.IsEnum)
                {
                    typeModel.Add(field.MemberType, false);
                }
            }

            try
            {
                Debug.Log($"{metaType.Type} -> {typeModel.GetSchema(metaType.Type)}");
            }
            catch (Exception e)
            {
                Debug.LogError($"{metaType.Type}");
                Debug.LogException(e);
            }
        }
#endif

    }
}

The serialized surrogates are written to the output stream one by one, preceded by two integers: the first is the length of the data in bytes, the second is an integer that represents the type.

  byte[] header1 = BitConverter.GetBytes(length);
  byte[] header2 = BitConverter.GetBytes(typeIndex);

  stream.Write(header1, 0, header1.Length);
  stream.Write(header2, 0, header2.Length);
  stream.Write(data, 0, data.Length);

On the one hand, this results in a custom format, but on the other hand, it should allow you to deserialize data selectively. For example, deserialize only TransformSurrogates.

Let me know if you will be able to do something with the above info. I will also try to understand the best way to do this and whether data can be deserialized using the schema returned by typeModel.GetSchema.

happy-turtle commented 3 months ago

Thanks for looking into it! 🙌 I tried logging the schemas and that worked fine so far. I still need to try and actually use the schema for deserialization, I will post my findings here once I get to work on it.

happy-turtle commented 3 months ago

I did the first tests of trying to deserialize files based on the schemas, but it's still hard to fully understand the process. Here is what I got so far:

The schemas might not be fully cross-platform compatible. Some of them at least include default values that can not be read like [default = 00000000-0000-0000-0000-000000000000]. I had to replace it manually with [default = 0]. And I had to manually add bcl.proto from protobuf.net, but I am not sure if that's ok. In the file they mention in comments, that for cross-platform development you shouldn't use this file. Maybe some additional resource from protobuf-net on this can be found here and here. For starters I tried to read a .meta file and used the following generated schema Battlehub.Storage.Meta`2[System.Guid,System.String].proto for it.

syntax = "proto3";

package Battlehub.Storage;
import "bcl.proto"; // schema for protobuf-net's handling of core .NET types

message KeyValuePair_Guid_Guid {
   optional bcl_Guid Key = 1;
   optional bcl_Guid Value = 2;
}
message KeyValuePair_Guid_LinkMap_Guid_Guid {
   optional bcl_Guid Key = 1;
   optional LinkMap_Guid_Guid Value = 2;
}
message LinkMap_Guid_Guid {
   repeated KeyValuePair_Guid_Guid AssetIDToInstanceID = 1;
}
message Meta_Guid_String {
   optional bcl.Guid ID = 1 [default = 0]; // replaced 00000000-0000-0000-0000-000000000000 with 0
   optional string Name = 2;
   repeated bcl.Guid OutboundDependencies = 4;
   repeated bcl.Guid InboundDependencies = 5;
   optional string ThumbnailFileID = 6;
   optional string DataFileID = 7;
   repeated KeyValuePair_Guid_Guid Links = 8;
   repeated KeyValuePair_Guid_LinkMap_Guid_Guid LinkMaps = 9;
   optional int32 TypeID = 10 [default = 0];
   optional string LoaderID = 11;
   repeated bcl.Guid MarkedAsDestroyed = 12;
}

I loaded the schema with protobuf.js and then cut off the first 8 bytes based on the custom format, like you told me. This successfully decodes the file with a message. But it doesn't seem to fully interpret the data correctly, since I basically get a string representation of the binary data: { DataFileID: "\u0011�N9��&F�\u0012\nCube.sceneP���������\u0001", }

Here is the JavaScript code snippet I used:

import { validateAndSanitizePath } from './utils.js'
import fs from 'fs'
import protobuf from 'protobufjs'

const root = await protobuf.load("Battlehub.Storage.Meta`2[System.Guid,System.String].proto")
// Obtain a message type
const Message = root.lookupType("Battlehub.Storage.Meta_Guid_String");
fs.readFileAsync = util.promisify(fs.readFile);
const filePath = validateAndSanitizePath("./idbfs/be2f4634cfa8ce7174ad593b348fe778/Project/Cube.scene.meta");
res = await fs.readFileAsync(filePath);

// Runtime Editor specific: the first two int values (of 32 bit or 4 byte length) are
// the length of the data and the type ID of the data
// so if we know the type we can ignore the first 8 bytes and cut them off
res = res.subarray(8, res.length)

// Decode an Uint8Array (browser) or Buffer (node) to a message
const message = Message.decode(res).toJSON();
console.log(message);

If you got any tips for me on how to go on further with this, let me know. It's a tough nut to crack.

BattlehubCode commented 2 months ago

Hi, I was able to parse the meta and data files, see example below. I'm only reading transforms from the data file, but I think the same principle can be applied to other types.

image

SampleProject.zip

const fs = require('node:fs');
const util = require('util');
const protobuf = require('protobufjs');

// https://stackoverflow.com/questions/57493463/protobuf-net-serialize-deserialize-datetime-guid-types
function getInt64Bytes(x) {
    const bytes = Buffer.alloc(8);
    bytes.writeBigUInt64LE(BigInt(x));
    return bytes;
}

function btos(b) {
    return b.toString(16);
}

function guidToString(bclGuid){
    const lo = getInt64Bytes(bclGuid.lo);
    const hi = getInt64Bytes(bclGuid.hi);
    const guid = 
        btos(lo[3]) + btos(lo[2]) + btos(lo[1]) + btos(lo[0])  + "-" +
        btos(lo[5]) + btos(lo[4]) + "-" +
        btos(lo[7]) + btos(lo[6]) + "-" +
        btos(hi[0]) + btos(hi[1]) + "-" +
        btos(hi[2]) + btos(hi[3]) + btos(hi[4]) + btos(hi[5]) + btos(hi[6]) + btos(hi[7]);
    return guid;
}

async function parseMetaFile(protoPath, filePath) {
    try {
        const root = await protobuf.load(protoPath);
        const Message = root.lookupType("Battlehub.Storage.Meta_Guid_String");
        fs.readFileAsync = util.promisify(fs.readFile);

        const data = await fs.readFileAsync(filePath);

        const message = Message.decode(data);
        return message.toJSON();
    } catch (err) {
        console.error("Error parsing proto file:", err);
        throw err; 
    }
}

async function parseDataFile(protoPath, filePath) {
    try {

        const root = await protobuf.load(protoPath);

        const Message = root.lookupType("Battlehub.Storage.TransformSurrogate_Guid");

        fs.readFileAsync = util.promisify(fs.readFile);

        const data = await fs.readFileAsync(filePath);

        const dataView = new DataView(data.buffer);
        let offset = 0;

        // Array to store decoded messages
        const messagesArray = [];

        while (offset < data.length) {
            let length = dataView.getUint32(offset, true);  // get the length of the next piece of data
            offset += 4;

            let typeID = dataView.getUint32(offset, true); // get type id
            offset += 4;

            // TransformSurrogate typeID == 101
            // See Battlehub/StorageData/Surrogates/UnityEngine.TransformSurrogate.cs/_TYPE_INDEX constant
            if (typeID === 101) {
                const transformData = new Uint8Array(dataView.buffer, offset, length);
                const message = Message.decode(transformData);
                messagesArray.push(message.toJSON());
            }

            offset += length;
        }

        return messagesArray;
    } catch (err) {
        console.error("Error parsing proto file:", err);
        throw err; 
    }
}

async function run() {
    const metaProtoPath = "Meta.proto";
    const metaFilePath = "SampleProject/Scene.scene.meta";

    var metaJson = await parseMetaFile(metaProtoPath, metaFilePath);
    console.log(metaJson);
    console.log(guidToString(metaJson.ID))

    const dataProtoPath = "Transform.proto";
    const dataFilePath = "SampleProject/Scene.scene";

    var transformJson = await parseDataFile(dataProtoPath, dataFilePath);

    var lastTransform = transformJson[transformJson.length - 1]
    console.log(lastTransform);   
    console.log(guidToString(lastTransform.GameObjectID));   
}

run();
syntax = "proto3";

package Battlehub.Storage;
import "bcl.proto"; // schema for protobuf-net's handling of core .NET types

message Quaternion {
   optional float x = 2;
   optional float y = 3;
   optional float z = 4;
   optional float w = 5;
   optional Vector3 eulerAngles = 6;
}

message Vector3 {
   optional float x = 2;
   optional float y = 3;
   optional float z = 4;
}

message TransformSurrogate_Guid {
   optional bcl.Guid ID = 2 [default = 0];
   repeated bcl.Guid ChildrenIDs = 3;
   optional bcl.Guid GameObjectID = 4 [default = 0];
   optional bool ActiveSelf = 5 [default = false];
   optional string Name = 6;
   optional Vector3 LocalPosition = 7;
   optional Quaternion LocalRotation = 8;
   optional Vector3 LocalScale = 9;
   optional bcl.Guid ParentID = 10 [default = 0];
   optional bcl.Guid ParentGameObjectID = 11 [default = 0];
   optional string Tag = 12;
}

bcl.proto is only needed for the Guid type. I don't use other "problem" types like DateTime or TimeSpan. The "problem" is related to the protobuf.net encoding of the Guid, since the 16-byte guid is encoded as two 8-byte fields hi and lo. And lo uses what's called "crazy endian encoding".

Here's a simple but ugly function to convert a guid to a cross-platform string representation:

function getInt64Bytes(x) {
    const bytes = Buffer.alloc(8);
    bytes.writeBigUInt64LE(BigInt(x));
    return bytes;
}

function btos(b) {
    return b.toString(16);
}

function guidToString(bclGuid){
    const lo = getInt64Bytes(bclGuid.lo);
    const hi = getInt64Bytes(bclGuid.hi);
    const guid = 
        btos(lo[3]) + btos(lo[2]) + btos(lo[1]) + btos(lo[0])  + "-" +
        btos(lo[5]) + btos(lo[4]) + "-" +
        btos(lo[7]) + btos(lo[6]) + "-" +
        btos(hi[0]) + btos(hi[1]) + "-" +
        btos(hi[2]) + btos(hi[3]) + btos(hi[4]) + btos(hi[5]) + btos(hi[6]) + btos(hi[7]);
    return guid;
}
happy-turtle commented 2 months ago

Again thanks a lot for this, now I got a setup to read the data files with all of their components in JavaScript. On the C# side I had to add an additional export of a json file to be able to match the type ID to the correct .proto schema file. But I was also able to simplify the schema export to a pure Editor file, without the need to play a scene for export. This is what I ended up with:

using System;
using System.Collections.Generic;
using UnityEngine;
using UnityEditor;
using UnityEditor.SceneManagement;
using System.IO;
using Battlehub.Storage;
using Newtonsoft.Json;
using ProtoBuf.Meta;
using System.Linq;

public class ExportSchemas : MonoBehaviour
{
    [MenuItem("Tools/Runtime Editor/Export Proto Schemas")]
    private static void RunExport()
    {
        var roots = EditorSceneManager.GetActiveScene().GetRootGameObjects();
        foreach (var root in roots)
        {
            var serializer = new ExportSchemasSerializer<Guid, string>(new TypeMap());
            if (!Directory.Exists("ProtoBuf"))
            {
                Directory.CreateDirectory("ProtoBuf");
            }

            var indexToSchemaMap = serializer.IndexToTypeMap.ToDictionary(x => x.Key, x => GetSchemaTypeName(x.Value));
            Debug.Log(indexToSchemaMap);
            using (var streamWriter = new StreamWriter($"ProtoBuf/IndexToSchema.json"))
            {
                streamWriter.Write(JsonConvert.SerializeObject(indexToSchemaMap, Formatting.Indented));
            }

            if (!Directory.Exists("ProtoBuf/Schemas"))
            {
                Directory.CreateDirectory("ProtoBuf/Schemas");
            }

            var schemaDictionary = CollectSchemas();
            foreach (var entry in schemaDictionary)
            {
                using var streamWriter = new StreamWriter($"ProtoBuf/Schemas/{entry.Key}.proto");
                // add protocol buffer header
                streamWriter.Write("syntax = \"proto3\";\n\n");

                // Workaround for GUID:
                // protobuf-net generates [default = 00000000-0000-0000-0000-000000000000]
                // for cross-platform compatibility we need to replace it with [default = 0]
                var schema = entry.Value.Replace("[default = 00000000-0000-0000-0000-000000000000]", "[default = 0]");

                streamWriter.Write(schema);
            }
        }
    }

    private static Dictionary<string, string> CollectSchemas()
    {
        var schemaDictionary = new Dictionary<string, string>();
        new ExportSchemasSerializer<Guid, string>(new TypeMap());
        var typeModel = SerializerBase<Guid, string>.RuntimeTypeModel;

        typeModel.Add(typeof(UnityEngine.UI.Navigation.Mode), false);
        typeModel.Add(typeof(UnityEngine.Events.UnityEventCallState), false);
        typeModel.Add(typeof(UnityEngine.Events.PersistentListenerMode), false);
        typeModel.Add(typeof(UnityEngine.MeshTopology), false);

        var types = typeModel.GetTypes();
        foreach (MetaType metaType in types)
        {
            var fields = metaType.GetFields();
            foreach (var field in fields)
            {
                if (field.MemberType.IsEnum)
                {
                    typeModel.Add(field.MemberType, false);
                }
            }

            try
            {
                var schema = typeModel.GetSchema(metaType.Type);
                Debug.Log($"{metaType.Type} -> {schema}");
                schemaDictionary.Add(GetSchemaTypeName(metaType.Type), schema);
            }
            catch (Exception e)
            {
                Debug.LogError($"{metaType.Type}");
                Debug.LogException(e);
            }
        }
        return schemaDictionary;
    }

    // Based on https://stackoverflow.com/a/6584323
    public static string GetSchemaTypeName(Type type)
    {
        if (type.IsGenericParameter)
        {
            return type.Name;
        }

        if (!type.IsGenericType)
        {
            return type.Name;
        }

        var builder = new System.Text.StringBuilder();
        var name = type.Name;
        var index = name.IndexOf("`");
        builder.AppendFormat("{0}.{1}", type.Namespace, name[..index]);
        builder.Append('_');
        var first = true;
        foreach (var arg in type.GetGenericArguments())
        {
            if (!first)
            {
                builder.Append('_');
            }
            builder.Append(GetSchemaTypeName(arg));
            first = false;
        }
        return builder.ToString();
    }
}

public class ExportSchemasSerializer<TID, TFID> : Serializer<TID, TFID> where TID : IEquatable<TID> where TFID : IEquatable<TFID>
{
    public ExportSchemasSerializer(ITypeMap typeMap) : base(typeMap) { }

    public IReadOnlyDictionary<int, Type> IndexToTypeMap
    {
        get { return IndexToType; }
    }
}

And then in JavaScript with the help of the JSON type map I can dynamically find out which component schemas are needed for the deserialization of a file:

async function parseDataFile(protoDir: string, filePath: string) {
        [...]

        const typeMapBuffer = await readFileAsync('../IndexToSchema.json', 'utf-8')
        const typeMap: Record<string, string> = JSON.parse(typeMapBuffer)

        [...]

        while (offset < data.length) {

            [...]

            const fullTypeName = typeMap[typeID];
            const root = await protobuf.load(protoDir + fullTypeName + '.proto');
            const typeName = fullTypeName.split('.').pop()
            // The typeMap determines which surrogate is to be read. E.g. TransformSurrogate typeID == 101
            // See Battlehub/StorageData/Surrogates/UnityEngine.TransformSurrogate.cs/_TYPE_INDEX constant
            const Message = root.lookupType(typeName);
            const transformData = new Uint8Array(dataView.buffer, offset, length);
            const message = Message.decode(transformData);
            messagesArray.push(message.toJSON());

            offset += length;
        }

        [...]
}

I hope this gives enough insight. Thank you once more!