Supports JSON format - Githubissues

As mentioned in https://github.com/apache/datafusion/issues/7845#issuecomment-2068061465, I was greatly inspired by JSONA and proposed a JSONA variant(Maybe we can call it JSONC, JSON for columnar storage formats😙) that may benefit from the high compression rates of columnar storage formats. BTW, If we decide to implement our own JSON storage implementation, It's definitely an excellent opportunity to evaluate various storage implementations of JSON in the OLAP scenario.

A naive proposal of JSONA variant.

For JSON [false, 10, {"k":"v"}, null] can be stored as the following struct.

Struct {
    Nodes: [StartArray, FALSE, Number, StartObject, Key, String, EndObject, NULL, EndArray]
    Offsets: [NULL, NULL, 0, NULL, 0, 0, NULL,…]
    Keys: ["k"]
    Strings: ["v"]
    Numbers: [10]
}

The Struct data can be efficiently encoded into compact files using the underlying file format. In our scenario, we use the Parquet as the underlying file format. For instance, the Nodes field can be represented as UINT8 and efficiently encoded using default dictionary encoding.

GreptimeTeam / greptimedb

Supports JSON format #3686

What problem does the new feature solve?

What does the feature do?

Implementation challenges

A naive proposal of JSONA variant.