StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.88k stars 1.78k forks source link

Struct/Map support roadmap #19355

Open Seaven opened 1 year ago

Seaven commented 1 year ago

Backgroud

At present, starrocks already supports static Struct/Map types in the external table, and can query external tables (hive/parquet/....) through related expressions. Some user will load the external data into starrocks, but starrocks has not supported the storage and import process of Struct/Map in the OLAP table.

Target

  1. Internal tables support DDL operations on static Struct/Map type columns
  2. Support importing Struct/Map types from external data sources to starrocks
  3. Support querying Struct/Map and related expressions
  4. Optimize Struct/Map query performance as much as appearance

Currently supported

  1. Struct/Map storage of internal tables (the code is ready, but needs to be test & review)
  2. Memory column structure of Struct/Map
  3. Some simple expressions

Phase 1.

  1. Internal tables support DDL operations on static Struct/Map type columns

    • [x] Table schema create
    • [x] Stuct/Map type syntax
  2. Support load Struct/Map types

    • [x] Data source format (Parquet, ORC)
    • [x] Import ways (Insert Into, Broker Load)
    • [x] Test & review Struct/Map storage of internal tables
  3. Optimize Struct/Map query performance

    • [x] Struct/Map column access performance optimization
    • [x] Structure internal column prune design & implement

Phase 2.

  1. Internal tables support DDL operations on static Struct/Map type columns

    • [ ] Table schema change
  2. Support load Struct/Map types

    • [ ] Data source format (CSV)
    • [ ] Import ways (Stream Load, Rotinue Load)
  3. Support querying Struct/Map and related expressions

    • [x] Struct/Map data type conversion
  4. Optimize Struct/Map query performance

    • [ ] Expression performance optimization

other

mcgray commented 1 year ago

We would very much like struct support for views on top of Iceberg tables.

pedrong commented 1 year ago

Also looking forward for these along with Views support

alberttwong commented 1 year ago

https://github.com/StarRocks/starrocks/issues/31282

mabbasi90 commented 10 months ago

Hey guys, When we can expect supporting struct/map types in stream load? Is there an estimate for that?

ever4Kenny commented 6 months ago

Hi guys, any update on the "Support load Struct/Map types: Data source format (CSV)"?

ever4Kenny commented 6 months ago

Any tips if I would love to work on "Support load Struct/Map types: Data source format (CSV)"? Could you guys shed some light on the entry class?

jaogoy commented 6 months ago

@ever4Kenny Currently, we want to support loading Struct data thourgh JSON format first. (Its' similar for CSV) The rough idea is that:

  1. Users convert Struct data into a string (structed-formatted string, or json-formatted string), or a JSON object.
  2. The corresponding target column is defined as a Struct data type column.
  3. Stream Load or Routine Load can convert such a string or JSON object into the target Struct data column.
github-actions[bot] commented 1 week ago

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!