StarRocks / starrocks-connector-for-apache-spark

Apache License 2.0
36 stars 53 forks source link

[Feature] Support to load data to bitmap/hll columns #67

Closed banmoy closed 1 year ago

banmoy commented 1 year ago

What type of PR is this:

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

Support to load data to bitmap/hll columns, and the main changes are

  1. add option starrocks.column.types so that users can tell the connector how to map the bitmap/hll column to spark's data type. This option could be used generally if the data type mapping between starrocks and spark does not follow the default mapping
  2. convert the data from spark to bitmap/hll using bitmpa/hll function
    • for bitmap, convert TINYINTSMALLINTINTEGERBIGINT in Spark to StarRocks BITMAP with to_bitmap function, and other types with bitmap_hash
    • for hll, use hll_hash to convert spark data to hll

An simple example

StarRocks DDL

CREATE TABLE `test`.`page_uv` (
  `page_id` INT NOT NULL COMMENT 'page ID',
  `visit_date` datetime NOT NULL COMMENT 'access time',
  `visit_users` BITMAP BITMAP_UNION NOT NULL COMMENT 'user ID'
) ENGINE=OLAP
AGGREGATE KEY(`page_id`, `visit_date`)
DISTRIBUTED BY HASH(`page_id`)
PROPERTIES (
  "replication_num" = "1"
);

Spark DDL

CREATE TABLE `page_uv`
USING starrocks
OPTIONS(
   "starrocks.fe.http.url"="127.0.0.1:8038",
   "starrocks.fe.jdbc.url"="jdbc:mysql://127.0.0.1:9038",
   "starrocks.table.identifier"="test.page_uv",
   "starrocks.user"="root",
   "starrocks.password"="",
   "starrocks.column.types"="visit_users BIGINT"
);

Checklist: