dfinity / ICRC

Repository to ICRC proposals
Apache License 2.0
28 stars 5 forks source link

ICRC-16: EnhancedValue - Standardizing Unstructured Data Interoperability #16

Open skilesare opened 1 year ago

skilesare commented 1 year ago

ICRC-16 EnhancedValue

Context

The proposed ICRC-16 EnhancedValue standard defines a Candid interface for unstructured data that canisters can use to exchange document-style data in a standardized way. This standard aims to facilitate the exchange of unstructured data between canisters and improve interoperability between different systems.

Data details

Summary

The ICRC-16 standard proposes a Candid interface for unstructured data to facilitate data exchange between canisters in a standardized way.

Introduction

The proposed standard describes the Candid interface for unstructured data that canisters can use to exchange data in a flexible and interoperable way. This interface is built upon the Candid serialization format and defines a set of types can be used to handle various types of unstructured data.

Goals

The main goals of this standard are to:

Candid Interface Definition

The ICRC-16 EnhancedValue standard defines a Candid interface for unstructured data that includes the following type:

type ICRC16 =
  variant {
    Array: vec ICRC16;
    Blob: blob;
    Bool: bool;
    Bytes: vec nat8;
    Class: vec ICRC16Property;
    Float: float64;
    Floats: vec float64;
    Int: int;
    Int16: int16;
    Int32: int32;
    Int64: int64;
    Int8: int8;
    Map: vec record {
      text;
      ICRC16;
    };
    ValueMap: vec record {
      ICRC16;
      ICRC16;
    };
    Nat: nat;
    Nat16: nat16;
    Nat32: nat32;
    Nat64: nat64;
    Nat8: nat8;
    Nats: vec nat;
    Option: opt ICRC16;
    Principal: principal;
    Set: vec ICRC16;
    Text: text;
};

This type defines a set of variants that can be used to represent different types of unstructured data, including arrays, blobs, booleans, bytes, classes, floats, integers, maps, naturals, options, principals, sets, and text.

Complementary standards

This standard can be used by other ICRC standards that require metadata or unstructured data exchange, such as:

The ICRC-16 standard can be implemented in any language that supports Candid serialization, such as Rust, Motoko, Azel, or Kybra. Implementers can use the standard type and service method to handle unstructured data in a consistent and efficient way. The ICRC-16 standard also encourages the development of standard libraries that can convert unstructured data into optimized objects, such as the Candy_Library example provided in the use case section.

Rationale

The need for a standard Candid interface for unstructured data arises from the fact that unstructured data is ubiquitous in many systems, including the Internet Computer. Unstructured data can come in many forms, such as JSON, XML, YAML, or even binary data, and can be used for various purposes, such as exchanging documents, files, or metadata. However, the lack of a standardized approach to unstructured data exchange can create interoperability issues and make it difficult for developers to handle unstructured data in a consistent and efficient way.

By defining a Candid interface for unstructured data, the ICRC-16 standard aims to provide a common ground for canisters to exchange unstructured data in a flexible and interoperable way. This standard defines a set of types that can be used to represent and access different types of unstructured data, including arrays, blobs, maps, and text. The standard also complements other Candid-related standards, such as ICRC-12 for Candid extensions, and can be used by other ICRC standards that require metadata or unstructured data exchange.

Security Considerations

The ICRC-16 standard defines a Candid interface for unstructured data that can be used to exchange data between canisters. However, care should be taken to ensure that the exchanged data is secure and does not pose a security risk to the system. In particular, canisters should validate the data they receive from other canisters to ensure that it conforms to the expected format and does not contain malicious code or data.

Implementers of the ICRC-16 standard should also consider the security implications of their implementation and follow best practices for secure software development. This includes using secure coding practices, validating user input, sanitizing data, and following the principle of least privilege. Implementers should also consider the potential impact of denial-of-service attacks or other forms of attacks that can exploit vulnerabilities in the system.

In particular, the size of a EnhancedValue object could be used in an attack. Depending on your use case, you may want to check the size of the object before storing or processing it to make sure it doesn't violate rational use cases.

Conclusion

The proposed ICRC-16 EnhancedValue standard defines a Candid interface for unstructured data that canisters can use to exchange data in a flexible and interoperable way. This standard aims to simplify the exchange of unstructured data and improve interoperability between different systems. We believe that this standard will be useful for developers who need to handle unstructured data in a consistent and efficient way and that it will facilitate the development of standard libraries and tools that can work with unstructured data.

We welcome feedback and contributions from the community to help refine and improve this standard.

Gekctek commented 1 year ago

Not sure where to have a conversation so I'll link here too https://forum.dfinity.org/t/icrc-16-candyshared-standardizing-unstructured-data-interoperability/18893/6?u=gekctek

frederikrothenberger commented 1 year ago

@skilesare: Could you elaborate on the use-cases of this a bit further? The candid encoded data already contains type information (i.e. is self-describing). The only thing not contained in a candid encoded value are the field names (which this does not give you either, if I see this correctly).

So, in particular. What benefits does this ICRC offer that goes beyond sharing candid as is and inspecting its type?

There already is an experimental feature on the JS candid library to do exactly that.

Usage example:

  const encoded = '4449444c026e016c02a0d2aca8047c90eddae7040001000101010200';
  const value = IDL.decode([IDL.Unknown], fromHexString(encoded))[0] as any;
  expect(value).toEqual([
    { _1158359328_: BigInt(1), _1291237008_: [{ _1158359328_: BigInt(2), _1291237008_: [] }] },
  ]);

The type of the decoded value can then be inspected by calling value.type().

To build upon this and make it more useful, it would require to expand the candid implementations with the following features:

What do you think?

skilesare commented 1 year ago

The main use case is for storing unstructured data in the type strict environments of Motoko and Rust.

Particularly we are using it as our metadata structure at Origyn where the metadata will grow, expand, and be contributed to by third parties over time. In other words, we can't know the type structure ahead of time and we don't want to have to upgrade thousands of canisters every time a game adds a new monster class.

Other use cases are storing unstructured data(like json files) that can be easily reasoned about and computed over without deserialization.

This isn't so much about encoding and decoding of candid, but about having a way to talk about, store, and transmit unstructured data.

It will also make other ICRCs easier to design and improve without code upgrades. Say the "Wallet" ICRC has a configOptions variable. We can define this out and then upgrade everyone's wallets everytime we need to have a new wallet, or we can say that the configOptions are a CandyShared type that conforms to the WalletConfigOptions v2 CandySchema(future ICRC) and that is easily searchable via CandyPath(Our indexing canister is already doing some of this, but not is a scaleable way).

There is a balance between the strong typing that would be preferred and the actual logistics of dealing with data where you just don't know the structure, or that may evolve quickly.

skilesare commented 1 year ago

At Roman's suggestion, and to keep compatibility with existing structures in the ledgers as a subtype, I'm updating the #Map variant to (text, CandyShared) and adding a #ValueMap that is (CandyShared, CandyShared).

zensh commented 1 month ago

I strongly recommend using CBOR (Concise Binary Object Representation, RFC8949), as it is very mature. https://datatracker.ietf.org/doc/html/rfc8949

And CDDL: https://datatracker.ietf.org/doc/html/rfc8610