Open morningman opened 1 year ago
全面向量化支持,性能大幅提升
在标准的ssb-100的宽表性能测试中,1.2 的性能相较于1.1 提升2倍;在复杂的TPCH 场景下1.2 的性能相较于1.1 提升3倍。
Merge-on-Write Unique Key
在原有的 Unique Key 数据模型上,支持了 Merge-on-Write 的数据更新模式。该模式在数据写入时即对需要删除或更新的数据进行标记,从而避免了在读取时对数据进行 Merge Read 的开销,极大的提高了可更新数据模型上的读取效率。
Multi Catalog
多源数据目录功能为Doris提供了快速接入外部数据源进行访问的能力。用户可以通过 CREATE CATALOG
命令连接到外部数据源。Doris 会自动映射外部数据源的库、表信息。之后,用户就可以像访问普通表一样,对这些外部数据源中的数据进行访问了。避免了之前用户需要对每张表手动建立外表映射的复杂操作。
目前该功能支持以下数据源:
文档:https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/multi-catalog)
注:相应的权限层级也会自动变更,详见“升级注意事项”部分
轻量表结构变更
在新版本中,对数据表的加减列操作,不再需要同步更改数据文件,仅需在 FE 中更新元数据即可,从而实现毫秒级的Schema Change 操作。通过该功能,可以实现对上游 CDC 数据的 DDL 同步能力。如用户可以通过 Flink CDC,实现上游数据库到 Doris 的 DML 和 DDL 同步。
通过建表的时候在 properties设 "light_schema_change"="true"
即可。
JDBC 外表
在新版本中,用户可以通过 JDBC 连接支持JDBC的外部数据源。当前已支持:
文档:https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/jdbc-of-doris/
注:ODBC 外表功能将在之后的某个版本移除,请尽量切换到 JDBC 外表功能。
JAVA UDF
支持通过 Java 编写 UDF/UDAF,方便用户在 Java 生态中使用自定义函数。同时,通过堆外内存、Zero Copy 等技术,使得跨语言的数据访问效率大幅提升。
文档连接:https://doris.apache.org/zh-CN/docs/dev/ecosystem/udf/java-user-defined-function
示例:https://github.com/apache/doris/tree/master/samples/doris-demo
Remote UDF
支持通过 RPC 的方式访问远程用户自定义函数服务,从而彻底消除用户编写UDF的语言限制。用户可以使用任意编程语言实现自定义函数,完成复杂的数据分析工作。
文档:https://doris.apache.org/zh-CN/docs/ecosystem/udf/remote-user-defined-function
示例:https://github.com/apache/doris/tree/master/samples/doris-demo
更多数据类型支持
Array 类型
支持了数组类型。同时也支持多级嵌套的数组类型。在一些用户画像,标签等场景,可以利用Array类型更好的适配业务场景。同时,新版本中,我们也实现了大量的数据相关的函数,以更好的支持数据类型在实际场景中的应用。
文档:https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Types/ARRAY
相关函数:https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/array-functions/array_max
Jsonb 类型
支持二进制的Json数据类型:Jsonb。该类型提供更紧凑的json编码格式,同时提供在编码格式上的数据访问,相比于使用字符串存储的json数据,有数倍的新更能提升。
文档:https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Types/JSONB
相关函数:https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/json-functions/jsonb_parse
DateV2
文档:https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Types/DATEV2
影响范围:
全新的内存管理框架
Table Valued Function
Doris 实现了一组 Table Valued Function(TVF),TVF 可以视作一张普通的表,可以出现在 SQL 中所有“表”可以出现的位置。
比如我们可以使用 S3 TVF 实现对象存储上的数据导入:
insert into tbl select * from s3("s3://bucket/file.*", "ak" = "xx", "sk" = "xxx") where c1 > 2;
或者直接查询 HDFS 上的数据文件:
insert into tbl select * from hdfs("hdfs://bucket/file.*") where c1 > 2;
TVF 可以帮助用户充分利用 SQL 丰富的表达能,灵活处理各类数据。
文档:
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/table-functions/s3
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/table-functions/hdfs
更便捷的分区创建方式
支持通过 FROM TO
命令创建一个时间范围内的多个分区。
列重命名
对于开启了 Light Schema Change 的表,支持对列进行重命名。
更丰富权限管理
支持行级权限
可以通过 CREATE ROW POLICY
命令创建行级权限。
支持指定密码强度、过期时间等。
支持在多次失败登录后锁定账户。
导入相关
CSV 导入支持带 header的 csv文件。
在文档中搜索 csv_with_names
:https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD/
Stream Load 新增 hidden_columns
,可以显式指定 delete flag 列和 sequence 列。
在文档中搜索 `hidden_columns`:https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD
Spark Load 支持 Parquet 和 ORC 文件导入。
支持清理已完成的导入的 Label
支持通过状态批量取消导入作业
文档:https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CANCEL-LOAD
broker load 新增支持阿里云 oss,腾讯云 cos/chdfs 和华为云 obs。
文档:https://doris.apache.org/zh-CN/docs/dev/advanced/broker
支持通过 hive-site.xml 文件配置访问 hdfs。
文档:https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/config-dir
支持通过 SHOW CATALOG RECYCLE BIN
功能查看回收站中的内容。
支持 SELECT * EXCEPT
语法。
文档:https://doris.apache.org/zh-CN/docs/dev/data-table/basic-usage
OUTFILE 支持 ORC 格式导出。并且支持多字节分隔符。
支持通过配置修改可保存的 Query Profile 的数量。
文档搜索 FE 配置项:max_query_profile_num
DELETE 语句支持 IN 谓词条件。并且支持分区裁剪。
时间列的默认值支持使用 CURRENT_TIMESTAMP
文档中搜索 "CURRENT_TIMESTAMP":https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE
添加两张系统表:backends,rowsets
文档: https://doris.apache.org/zh-CN/docs/dev/admin-manual/system-table/backends https://doris.apache.org/zh-CN/docs/dev/admin-manual/system-table/rowsets
备份恢复
Restore作业支持 reserve_replica
参数,使得恢复后的表的副本数和备份时一致。
Restore 作业支持 reserve_dynamic_partition_enable
参数,使得恢复后的表保持动态分区开启状态。
支持通过内置的 libhdfs 进行备份恢复操作,不再依赖 broker。
支持同机多磁盘之间的数据均衡
文档: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-REBALANCE-DISK https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-CANCEL-REBALANCE-DISK
Routine Load 支持订阅 Kerberos 认证的 Kafka 服务。
文档中搜索 kerberos:https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/routine-load-manual
New built-in-function
新增以下内置函数:
cbrt
sequence_match/sequence_count
mask/mask_first_n/mask_last_n
elt
any/any_value
group_bitmap_xor
ntile
nvl
uuid
initcap
regexp_replace_one/regexp_extract_all
multi_search_all_positions/multi_match_any
domain/domain_without_www/protocol
running_difference
bitmap_hash64
murmur_hash3_64
to_monday
not_null_or_empty
window_funnel
group_bit_and/group_bit_or/group_bit_xor
outer combine
以及所有 array 函数
使用JDK11 编译和运行 FE,BE,导致BE偶发性crash。请使用 JDK8。
权限层级变更 因为引入了Catalog层级,所以相应的用户权限层级也会自动变更。规则如下:
GroupBy 和 Having 子句中,优先使用列名而不是别名进行匹配。(#14408)
不再支持创建以 "mv" 开头的列。"mv" 是物化视图中的保留关键词 (#14361)
移除了 order by 语句默认添加的 65535 行的 limit 限制,并增加session变量 default_order_by_limit
可以自定配置这个限制。(#12478)
"Create Table As Select" 生成的表,所有字符串列统一使用 string类型,不再区分 varchar/char/string (#14382)
audit log 中,移除db和user名称前的 default_cluster
字样。(#13499)(#11408)
audit log 中增加 sql digest 字段(#8919)
union子句总order by逻辑变动。新版本中,order by子句将在union执行完成后执行,除非通过括号进行显式的关联。(#9745)
进行decommission 操作时,会忽略回收站中的tablet,确保decomission能够完成。(#14028)
Decimal 的返回结果将按照原始列中声明的精度进行显示 ,或者按照显式指定的cast 函数中的精度进行展示。(#13437)
列名的长度限制由64变更为256(#14671)
FE 配置项变动
默认开启 enable_vectorized_load
参数。(#11833)
增大了 create_table_timeout
值。建表操作的默认超时时间将增大。 (#13520)
修改 stream_load_default_timeout_second
默认值为 3天。
修改alter_table_timeout_second
的默认值为 一个月。
增加参数 max_replica_count_when_schema_change
用于限制 alter 作业中涉及的 replica数量,默认为100000。(#12850)
添加 disable_iceberg_hudi_table
。默认禁用了 iceberg 和 hudi 外表,推荐使用 multi catalog功能。(#13932)
BE 配置项变动
移除了 disable_stream_load_2pc
参数。2PC的stream load可直接使用。 (#13520)
修改tablet_rowset_stale_sweep_time_sec
,从1800秒修改为 300 秒。
重新设计了关于 compaction 的配置项名称 (#13495)
重新涉及了关于内存优化的参数(#13781)
Session变量变动
修改变量 enable_insert_strict
默认为 true。这会导致一些之前可以执行,但是插入了非法值的insert操作,不再能够执行。(11866)
修改变量 enable_local_exchange
默认为 true (#13292)
默认通过 lz4 压缩进行数据传输,通过变量 fragment_transmission_compression_codec
控制 (#11955)
增加 skip_storage_engine_merge
变量,用于调试 unique 或 agg 模型的数据 (#11952)
文档:https://doris.apache.org/zh-CN/docs/dev/advanced/variables
BE 启动脚本会通过 /proc/sys/vm/max_map_count
检查数值是否大于200W。否则启动失败。(#11052)
移除了 mini load 接口 (#10520)
FE Metadata Version
FE Meta Version 由 107 变更为 114,升级后不可回滚。
升级准备
需替换:lib, bin 目录(start/stop 脚本均有修改)
BE 也需要配置 JAVA_HOME,已支持 JDBC Table 和 Java UDF。
fe.conf 中默认 JVM Xmx参数修改为 8GB。
升级过程中可能的错误
repeat 函数不可使用并报错:vectorized repeat function cannot be executed
,可以在升级前先关闭向量化执行引擎。 (#13868)
schema change 失败并报错:desc_tbl is not set. Maybe the FE version is not equal to the BE
(#13822)
向量化 hash join 不可使用并报错。vectorized hash join cannot be executed
。可以在升级前先关闭向量化执行引擎。(#13753)
以上错误在完全升级后会恢复正常。
默认使用 JeMalloc 作为新版本BE的内存分配器,替换 TcMalloc (#13367)
tablet sink 中的 batch size 修改为至少 8K。(#13912)
默认关闭 chunk allocator (#13285)
BE 的 http api 错误返回信息,由 {"status": "Fail", "msg": "xxx"}
变更为更具体的 {"status": "Not found", "msg": "Tablet not found. tablet_id=1202"}
(#9771)
SHOW CREATE TABLE
中, comment的内容由双引号包裹变为单引号包裹(#10327)
支持普通用户通过 http 命令获取 query profile。(#14016) 文档:https://doris.apache.org/zh-CN/docs/dev/admin-manual/http-actions/fe/manager/query-profile-action
优化了 sequence 列的指定方式,可以直接指定列名。(#13872) 文档:https://doris.apache.org/zh-CN/docs/dev/data-operate/update-delete/sequence-column-manual
show backends
和 show tablets
返回结果中,增加远端存储的空间使用情况 (#11450)
移除了 Num-Based Compaction 相关代码(#13409)
重构了BE的错误码机制,部分返回的错误信息会发生变化(#8855) Other
支持Docker 官方镜像。
支持在 MacOS(x86/M1) 和 ubuntu-22.04 上编译 Doris 文档:https://doris.apache.org/zh-CN/docs/dev/install/source-install/compilation-mac/
支持进行image 文件的校验。
文档:https://doris.apache.org/zh-CN/docs/dev/admin-manual/maint-monitor/metadata-operation/
脚本相关
FE、BE 的 stop 脚本支持通过 --grace
参数退出FE、BE(使用 kill -15 信号代替 kill -9)
FE start 脚本支持通过 --version 查看当前FE 版本(#11563)
支持通过 ADMIN COPY TABLET
命令获取某个 tablet 的数据和相关建表语句,用于本地问题调试 (#12176)
支持通过 http api,获取一个SQL语句相关的 建表语句,用于本地问题复现(#11979)
文档:https://doris.apache.org/zh-CN/docs/dev/admin-manual/http-actions/fe/query-schema-action
支持建表时关闭这个表的 compaction 功能,用于测试 (#11743)
文档中搜索 "disble_auto_compaction":https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE
[Chinese Version. See below]
Feature
Highlight
Full Vectorizied-Engine support, greatly improved performance
In the standard ssb-100-flat benchmark, the performance of 1.2 is 2 times faster than that of 1.1; in complex TPCH 100 benchmark, the performance of 1.2 is 3 times faster than that of 1.1.
Merge-on-Write Unique Key
Support Merge-On-Write on Unique Key Model. This mode marks the data that needs to be deleted or updated when the data is written, thereby avoiding the overhead of Merge-On-Read when querying, and greatly improving the reading efficiency on the updateable data model.
Multi Catalog
The multi-catalog feature provides Doris with the ability to quickly access external data sources for access. Users can connect to external data sources through the
CREATE CATALOG
command. Doris will automatically map the library and table information of external data sources. After that, users can access the data in these external data sources just like accessing ordinary tables. It avoids the complicated operation that the user needs to manually establish external mapping for each table.Currently this feature supports the following data sources:
Documentation: https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/multi-catalog)
Light table structure changes
In the new version, it is no longer necessary to change the data file synchronously for the operation of adding and subtracting columns to the data table, and only need to update the metadata in FE, thus realizing the millisecond-level Schema Change operation. Through this function, the DDL synchronization capability of upstream CDC data can be realized. For example, users can use Flink CDC to realize DML and DDL synchronization from upstream database to Doris.
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE
When creating a table, set
"light_schema_change"="true"
in properties.JDBC facade
Users can connect to external data sources through JDBC. Currently supported:
Documentation: https://doris.apache.org/zh-CN/docs/dev/ecosystem/external-table/jdbc-of-doris/
JAVA UDF
Supports writing UDF/UDAF in Java, which is convenient for users to use custom functions in the Java ecosystem. At the same time, through technologies such as off-heap memory and Zero Copy, the efficiency of cross-language data access has been greatly improved.
Document: https://doris.apache.org/zh-CN/docs/dev/ecosystem/udf/java-user-defined-function
Example: https://github.com/apache/doris/tree/master/samples/doris-demo
Remote UDF
Supports accessing remote user-defined function services through RPC, thus completely eliminating language restrictions for users to write UDFs. Users can use any programming language to implement custom functions to complete complex data analysis work.
Documentation: https://doris.apache.org/zh-CN/docs/ecosystem/udf/remote-user-defined-function
Example: https://github.com/apache/doris/tree/master/samples/doris-demo
More data types support
Array type
Array types are supported. It also supports nested array types. In some scenarios such as user portraits and tags, the Array type can be used to better adapt to business scenarios. At the same time, in the new version, we have also implemented a large number of data-related functions to better support the application of data types in actual scenarios.
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Types/ARRAY
Related functions: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/array-functions/array_max
Jsonb type
Support binary Json data type: Jsonb. This type provides a more compact json encoding format, and at the same time provides data access in the encoding format. Compared with json data stored in strings, it is several times newer and can be improved.
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Types/JSONB
Related functions: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/json-functions/jsonb_parse
Date V2
Sphere of influence:
Documentation: https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Types/DATEV2
More
A new memory management framework
Documentation: https://doris.apache.org/zh-CN/docs/dev/admin-manual/maint-monitor/memory-management/memory-tracker
Table Valued Function
Doris implements a set of Table Valued Function (TVF). TVF can be regarded as an ordinary table, which can appear in all places where "table" can appear in SQL.
For example, we can use S3 TVF to implement data import on object storage:
Or directly query data files on HDFS:
TVF can help users make full use of the rich expressiveness of SQL and flexibly process various data.
Documentation:
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/table-functions/s3
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/table-functions/hdfs
A more convenient way to create partitions
Support for creating multiple partitions within a time range via the
FROM TO
command.Column renaming
For tables with Light Schema Change enabled, column renaming is supported.
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-RENAME
Richer permission management
Support row-level permissions
Row-level permissions can be created with the
CREATE ROW POLICY
command.Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-POLICY
Support specifying password strength, expiration time, etc.
Support for locking accounts after multiple failed logins.
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Account-Management-Statements/ALTER-USER
Import
CSV import supports csv files with header.
Search for
csv_with_names
in the documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD/Stream Load adds
hidden_columns
, which can explicitly specify the delete flag column and sequence column.Search for
hidden_columns
in the documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOADSpark Load supports Parquet and ORC file import.
Support for cleaning completed imported Labels
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CLEAN-LABEL
Support batch cancellation of import jobs by status
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/CANCEL-LOAD
Added support for Alibaba Cloud oss, Tencent Cloud cos/chdfs and Huawei Cloud obs in broker load.
Documentation: https://doris.apache.org/zh-CN/docs/dev/advanced/broker
Support access to hdfs through hive-site.xml file configuration.
Documentation: https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/config-dir
Support viewing the contents of the catalog recycle bin through
SHOW CATALOG RECYCLE BIN
function.Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Show-Statements/SHOW-CATALOG-RECYCLE-BIN
Support
SELECT * EXCEPT
syntax.Documentation: https://doris.apache.org/zh-CN/docs/dev/data-table/basic-usage
OUTFILE supports ORC format export. And supports multi-byte delimiters.
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE
Support to modify the number of Query Profiles that can be saved through configuration.
Document search FE configuration item: max_query_profile_num
The DELETE statement supports IN predicate conditions. And it supports partition pruning.
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/DELETE
The default value of the time column supports using
CURRENT_TIMESTAMP
Search for "CURRENT_TIMESTAMP" in the documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE
Add two system tables: backends, rowsets
Documentation:
https://doris.apache.org/zh-CN/docs/dev/admin-manual/system-table/backends
https://doris.apache.org/zh-CN/docs/dev/admin-manual/system-table/rowsets
Backup and restore
The Restore job supports the
reserve_replica
parameter, so that the number of replicas of the restored table is the same as that of the backup.The Restore job supports
reserve_dynamic_partition_enable
parameter, so that the restored table keeps the dynamic partition enabled.Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Backup-and-Restore/RESTORE
Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY
Support data balance between multiple disks on the same machine
Documentation:
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-REBALANCE-DISK
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-CANCEL-REBALANCE-DISK
Routine Load supports subscribing to Kerberos-authenticated Kafka services.
Search for kerberos in the documentation: https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/routine-load-manual
New built-in-function
Added the following built-in functions:
cbrt
sequence_match/sequence_count
mask/mask_first_n/mask_last_n
elt
any/any_value
group_bitmap_xor
ntile
nvl
uuid
initcap
regexp_replace_one/regexp_extract_all
multi_search_all_positions/multi_match_any
domain/domain_without_www/protocol
running_difference
bitmap_hash64
murmur_hash3_64
to_monday
not_null_or_empty
window_funnel
group_bit_and/group_bit_or/group_bit_xor
outer combine
Upgrade Notice
Known Issues
Behavior Changed
Permission level changes
Because the catalog level is introduced, the corresponding user permission level will also be changed automatically. The rules are as follows:
In GroupBy and Having clauses, match on column names in preference to aliases. (#14408)
Creating columns starting with
mv_
is no longer supported.mv_
is a reserved keyword in materialized views (#14361)Removed the default limit of 65535 rows added by the order by statement, and added the session variable
default_order_by_limit
to configure this limit. (#12478)In the table generated by "Create Table As Select", all string columns use the string type uniformly, and no longer distinguish varchar/char/string (#14382)
In the audit log, remove the word
default_cluster
before the db and user names. (#13499) (#11408)Add sql digest field in audit log (#8919)
The union clause always changes the order by logic. In the new version, the order by clause will be executed after the union is executed, unless explicitly associated by parentheses. (#9745)
During the decommission operation, the tablet in the recycle bin will be ignored to ensure that the decomission can be completed. (#14028)
The returned result of Decimal will be displayed according to the precision declared in the original column, or according to the precision specified in the cast function. (#13437)
Changed column name length limit from 64 to 256 (#14671)
Changes to FE configuration items
The
enable_vectorized_load
parameter is enabled by default. (#11833)Increased
create_table_timeout
value. The default timeout for table creation operations will be increased. (#13520)Modify
stream_load_default_timeout_second
default value to 3 days.Modify the default value of
alter_table_timeout_second
to one month.Increase the parameter
max_replica_count_when_schema_change
to limit the number of replicas involved in the alter job, the default is 100000. (#12850)Add
disable_iceberg_hudi_table
. The iceberg and hudi appearances are disabled by default, and the multi catalog function is recommended. (#13932)Changes to BE configuration items
Removed
disable_stream_load_2pc
parameter. 2PC's stream load can be used directly. (#13520)Modify
tablet_rowset_stale_sweep_time_sec
from 1800 seconds to 300 seconds.Redesigned configuration item name about compaction (#13495)
Revisited parameter about memory optimization (#13781)
Session variable changes
Modify the variable
enable_insert_strict
to true by default. This will cause some insert operations that could be executed before, but inserted illegal values, to no longer be executed. (11866)Modified variable
enable_local_exchange
to default to true (#13292)Default data transmission via lz4 compression, controlled by variable
fragment_transmission_compression_codec
(#11955)Add
skip_storage_engine_merge
variable for debugging unique or agg model data (#11952)Documentation: https://doris.apache.org/zh-CN/docs/dev/advanced/variables
The BE startup script will check whether the value is greater than 200W through
/proc/sys/vm/max_map_count
. Otherwise, the startup fails. (#11052)Removed mini load interface (#10520)
FE Metadata Version
FE Meta Version changed from 107 to 114, and cannot be rolled back after upgrading.
During Upgrade
Upgrade preparation
Need to replace: lib, bin directory (start/stop scripts have been modified)
BE also needs to configure JAVA_HOME, and already supports JDBC Table and Java UDF.
The default JVM Xmx parameter in fe.conf is changed to 8GB.
Possible errors during the upgrade process
The repeat function cannot be used and an error is reported:
vectorized repeat function cannot be executed
, you can turn off the vectorized execution engine before upgrading. (#13868)schema change fails with error:
desc_tbl is not set. Maybe the FE version is not equal to the BE
(#13822)Vectorized hash join cannot be used and an error will be reported.
vectorized hash join cannot be executed
. You can turn off the vectorized execution engine before upgrading. (#13753)The above errors will return to normal after a full upgrade.
Performance Impact
By default, JeMalloc is used as the memory allocator of the new version BE, replacing TcMalloc (#13367)
The batch size in the tablet sink is modified to be at least 8K. (#13912)
Disable chunk allocator by default (#13285)
Api change
BE's http api error return information changed from
{"status": "Fail", "msg": "xxx"}
to more specific{"status": "Not found", "msg": "Tablet not found. tablet_id=1202"}
(#9771)In
SHOW CREATE TABLE
, the content of comment is changed from double quotes to single quotes (#10327)Support ordinary users to obtain query profile through http command. (#14016) Documentation: https://doris.apache.org/zh-CN/docs/dev/admin-manual/http-actions/fe/manager/query-profile-action
Optimized the way to specify the sequence column, you can directly specify the column name. (#13872) Documentation: https://doris.apache.org/zh-CN/docs/dev/data-operate/update-delete/sequence-column-manual
Increase the space usage of remote storage in the results returned by
show backends
andshow tablets
(#11450)Removed Num-Based Compaction related code (#13409)
Refactored BE's error code mechanism, some returned error messages will change (#8855) other
Support Docker official image.
Support compiling Doris on MacOS(x86/M1) and ubuntu-22.04 Documentation: https://doris.apache.org/zh-CN/docs/dev/install/source-install/compilation-mac/
Support for image file verification.
Documentation: https://doris.apache.org/zh-CN/docs/dev/admin-manual/maint-monitor/metadata-operation/
script related
The stop scripts of FE and BE support exiting FE and BE via the
--grace
parameter (use kill -15 signal instead of kill -9)FE start script supports checking the current FE version via --version (#11563)
Support to get the data and related table creation statement of a tablet through the
ADMIN COPY TABLET
command, for local problem debugging (#12176)Documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-COPY-TABLET
Support to obtain a table creation statement related to a SQL statement through the http api for local problem reproduction (#11979)
Documentation: https://doris.apache.org/zh-CN/docs/dev/admin-manual/http-actions/fe/query-schema-action
Support to close the compaction function of this table when creating a table, for testing (#11743)
Search for "disble_auto_compaction" in the documentation: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE
Big Thanks
Thanks to ALL who contributed to this release! (alphabetically)
@924060929 @a19920714liou @adonis0147 @Aiden-Dong @aiwenmo @AshinGau @b19mud @BePPPower @BiteTheDDDDt @bridgeDream @ByteYue @caiconghui @CalvinKirs @cambyzju @caoliang-web @carlvinhust2012 @catpineapple @ccoffline @chenlinzhong @chovy-3012 @coderjiang @cxzl25 @dataalive @dataroaring @dependabot[bot] @dinggege1024 @DongLiang-0 @Doris-Extras @eldenmoon @EmmyMiao87 @englefly @FreeOnePlus @Gabriel39 @gaodayue @geniusjoe @gj-zhang @gnehil @GoGoWen @HappenLee @hello-stephen @Henry2SS @hf200012 @huyuanfeng2018 @jacktengg @jackwener @jeffreys-cat @Jibing-Li @JNSimba @Kikyou1997 @Lchangliang @LemonLiTree @lexoning @liaoxin01 @lide-reed @link3280 @liutang123 @liuyaolin @LOVEGISER @lsy3993 @luozenglin @luzhijing @madongz @morningman @morningman-cmy @morrySnow @mrhhsg @Myasuka @myfjdthink @nextdreamblue @pan3793 @pangzhili @pengxiangyu @platoneko @qidaye @qzsee @SaintBacchus @SeekingYang @smallhibiscus @sohardforaname @song7788q @spaces-X @ssusieee @stalary @starocean999 @SWJTU-ZhangLei @TaoZex @timelxy @Wahno @wangbo @wangshuo128 @wangyf0555 @weizhengte @weizuo93 @wsjz @wunan1210 @xhmz @xiaokang @xiaokangguo @xinyiZzz @xy720 @yangzhg @Yankee24 @yeyudefeng @yiguolei @yinzhijian @yixiutt @yuanyuan8983 @Yulei-Yang @zbtzbtzbt @zenoyang @zhangboya1 @zhangstar333 @zhannngchen @ZHbamboo @zhengshiJ @zhenhb @zhqu1148980644 @zuochunwei @zy-kkk