Support to load data to bitmap/hll columns, and the main changes are
add option starrocks.column.types so that users can tell the connector how to map the bitmap/hll column to spark's data type. This option could be used generally if the data type mapping between starrocks and spark does not follow the default mapping
convert the data from spark to bitmap/hll using bitmpa/hll function
for bitmap, convert TINYINT、SMALLINT、INTEGER、BIGINT in Spark to StarRocks BITMAP with to_bitmap function, and other types with bitmap_hash
for hll, use hll_hash to convert spark data to hll
An simple example
StarRocks DDL
CREATE TABLE `test`.`page_uv` (
`page_id` INT NOT NULL COMMENT 'page ID',
`visit_date` datetime NOT NULL COMMENT 'access time',
`visit_users` BITMAP BITMAP_UNION NOT NULL COMMENT 'user ID'
) ENGINE=OLAP
AGGREGATE KEY(`page_id`, `visit_date`)
DISTRIBUTED BY HASH(`page_id`)
PROPERTIES (
"replication_num" = "1"
);
What type of PR is this:
Which issues of this PR fixes :
Fixes #
Problem Summary(Required) :
Support to load data to bitmap/hll columns, and the main changes are
starrocks.column.types
so that users can tell the connector how to map the bitmap/hll column to spark's data type. This option could be used generally if the data type mapping between starrocks and spark does not follow the default mappingTINYINT
、SMALLINT
、INTEGER
、BIGINT
in Spark to StarRocksBITMAP
with to_bitmap function, and other types with bitmap_hashAn simple example
StarRocks DDL
Spark DDL
Checklist: