apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.46k stars 969 forks source link

[Feature] Snapshot supports the function of recording additional information of users #2892

Open MonsterChenzhuo opened 9 months ago

MonsterChenzhuo commented 9 months ago

Search before asking

Motivation

Search before asking

Motivation

To add an attribute for recording additional user-stored information in the existing snapshot structure, considering flexibility and expandability, a new attribute named extraInfo can be added to the top-level structure. This attribute stores a key-value pair structure, similar to a dictionary or Map, to facilitate the storage of extra user information.

{
  "extraInfo": {}
}

Design of extraInfo Structure Basic Key-Value Pairs: For storing simple user information such as preference settings. Nested Structure: Supports more complex data structures, such as lists or dictionaries, to store more detailed user data.

{
  "extraInfo": {
    "preferences": {
      "theme": "dark",
      "language": "zh-CN"
    },
    "history": [
      {
        "action": "login",
        "time": "2024-02-23T10:00:00Z"
      }
    ]
  }
}

Integration into the Existing Structure The extraInfo attribute is integrated into the structure you provided, with the new structure as follows:

{
  "version": 3,
  "id": 1,
  "schemaId": 0,
  "baseManifestList": "manifest-list-cd8aff2b-3fbb-430a-881f-1a5064fd1edd-0",
  "deltaManifestList": "manifest-list-cd8aff2b-3fbb-430a-881f-1a5064fd1edd-1",
  "changelogManifestList": null,
  "indexManifest": "index-manifest-5ef89060-6da5-4561-8e06-0394432e61d4-0",
  "commitUser": "bd9de2a7-64d0-4395-958e-f88c7c99383e",
  "commitIdentifier": 9223372036854775807,
  "commitKind": "APPEND",
  "timeMillis": 1708617516783,
  "logOffsets": {},
  "totalRecordCount": 1,
  "deltaRecordCount": 1,
  "changelogRecordCount": 0,
  "watermark": -9223372036854775808,
  "extraInfo": {
    "preferences": {
      "theme": "dark",
      "language": "zh-CN"
    },
    "history": [
      {
        "action": "login",
        "time": "2024-02-23T10:00:00Z"
      }
    ]
  }
}

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

MonsterChenzhuo commented 9 months ago

@JingsongLi What do you think about this design?

vanliu-tx commented 3 months ago

any update on this issue? we have a scenario that use a Java client to write a paimon primary key table, and need a way to store some information in snapshot. These information would help when the task starts after stop/crash, the task could use the information in snapshot to seek the right position to recover. @MonsterChenzhuo @JingsongLi