datafuselabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.66k stars 726 forks source link

Databend vs. Snowflake Features and Capabilities #13059

Open BohuTANG opened 11 months ago

BohuTANG commented 11 months ago

Features and Capabilities

Databend is an open-source alternative to Snowflake, so we need a page to introduce the differences between Databend and Snowflake. Feel free to contribute if you notice any omissions.

Category Snowflake Databend
Numeric Data Types Data Types Yes Yes [1, 2]
String & Binary Data Types Data Types Yes Yes
Logical Data Types Data Types Yes Yes
Date & Time Data Types Data Types Yes Yes
Geospatial Data Types Data Types Yes Yes
Bitmap Data Types Data Types No Yes
Semi-structured Data Types Data Types Yes Yes
Database Databases Yes Yes
Sequence Sequences Yes Yes
Table Tables Yes Yes
External Table(Table location in your bucket) Tables No Yes
Vacuum Tables No Yes
Clustering Tables Yes Yes(Distributed)
Full-Text Index Tables No Yes
Hive Lake Yes Yes
Iceberg Lake Yes(Private Preview) Yes
View Views Yes Yes
Materialized Views Views Yes Yes
User, Role, & Privilege Privileges Yes Yes
Query Queries Yes Yes
Update General DML Yes Yes(Distributed)
Replace General DML No Yes(Distributed)
Delete General DML Yes Yes(Distributed)
Merge General DML Yes Yes
Stage(Internal/External/Remove/List) Data Loading Yes Yes
Query Stage File Data Loading Yes Yes
Copy Into Table/Location Data Loading Yes Yes(Distributed)
Transforming Data During a Load Data Loading Yes Yes
Pipe Data Loading Yes Yes(Cloud)
Task Data Loading Yes, minimum schedule is 1 minute Yes, minimum schedule is 500 millisecond
Stream Data Loading Yes Yes
Table Stream(Append-only/Standard) Data Loading Yes Yes
User-Defined Functions Functions Yes Yes
External Functions Functions Yes Yes
Stored Procedure Functions Yes Planning
AI Functions Functions Yes, 2024 Add Yes
Window Function Functions Yes Yes
Time Travel Data Recovery Yes Yes
Data Sharing Data Sharing Yes Yes
RBAC Security Yes Yes
Network Policies Security Yes Yes
Masking Policies Security Yes Yes
Kafka Connect Connectors Yes Yes
Flink CDC Connectors No Yes
dbt Connectors Yes Yes
Drivers Drivers Go
Java
Ptyhon
Node.js
Go
Java
Python
Node.js
Rust
Embedded Python Library No Yes
Storage Format Storage MicroPartition Parquet
Aggregate&Join&Sort Spill Performance Optimization Yes Yes
Local Disk Cache Performance Optimization Yes Yes
Profiling Performance Optimization Powerful Powerful
CLI Client Client SnowSQL BendSQL
TPC-H Benchmark Yes Yes
TPC-DS Benchmark Yes Yes
Supported Cloud Platforms Cloud AWS/GCP/AZURE Any Cloud

Databend and Snowflake Functions Comparison

Mathematical Functions

Databend Function Snowflake Function
ABS(x) ABS(x)
ACOS(x) ACOS(x)
ASIN(x) ASIN(x)
ATAN(x) ATAN(x)
ATAN2(y, x) ATAN2(y, x)
CEIL(x) CEIL(x)
CEILING(x) CEILING(x)
COS(x) COS(x)
DEGREES(x) DEGREES(x)
EXP(x) EXP(x)
FLOOR(x) FLOOR(x)
LN(x) LN(x)
LOG(x) LOG(x)
LOG10(x) LOG10(x)
LOG2(x) LOG2(x)
PI() PI()
POW(x, y) POW(x, y)
POWER(x, y) POWER(x, y)
RADIANS(x) RADIANS(x)
RAND() RAND()
ROUND(x, d) ROUND(x, d)
SIGN(x) SIGN(x)
SIN(x) SIN(x)
SQRT(x) SQRT(x)
TAN(x) TAN(x)

String Functions

Databend Function Snowflake Function
ASCII(expr) ASCII(expr)
CHAR(n, ...) CHAR(n, ...)
CONCAT(expr1, ...) CONCAT(expr1, ...)
CONCAT_WS(separator, expr1, ...) CONCAT_WS(separator, expr1, ...)
INSTR(str, substr) INSTR(str, substr)
LENGTH(str) LENGTH(str)
LCASE(str) LOWER(str)
LEFT(str, len) LEFT(str, len)
LOWER(str) LOWER(str)
LTRIM(str) LTRIM(str)
POSITION(substr IN str) POSITION(substr IN str)
REPEAT(str, count) REPEAT(str, count)
REPLACE(str, from_str, to_str) REPLACE(str, from_str, to_str)
REVERSE(str) REVERSE(str)
RIGHT(str, len) RIGHT(str, len)
RTRIM(str) RTRIM(str)
SPACE(n) SPACE(n)
SUBSTR(str, pos) SUBSTR(str, pos)
SUBSTR(str, pos, len) SUBSTR(str, pos, len)
SUBSTRING(str, pos) SUBSTRING(str, pos)
SUBSTRING(str, pos, len) SUBSTRING(str, pos, len)
TRIM([{BOTH | LEADING | TRAILING} [remstr] FROM] str) TRIM([{BOTH | LEADING | TRAILING} [remstr] FROM] str)
UCASE(str) UPPER(str)
UPPER(str) UPPER(str)

Date and Time Functions

Databend Function Snowflake Function
ADD_DAYS(date, interval) DATEADD(day, interval, date)
ADD_HOURS(date, interval) DATEADD(hour, interval, date)
ADD_MINUTES(date, interval) DATEADD(minute, interval, date)
ADD_MONTHS(date, interval) DATEADD(month, interval, date)
ADD_SECONDS(date, interval) DATEADD(second, interval, date)
ADD_YEARS(date, interval) DATEADD(year, interval, date)
DATE(expr) DATE(expr)
DAY(date) DAY(date)
DAYNAME(date) DAYNAME(date)
DAY_OF_MONTH(date) DAYOFMONTH(date)
DAY_OF_WEEK(date) DAYOFWEEK(date)
DAY_OF_YEAR(date) DAYOFYEAR(date)
HOUR(date) HOUR(date)
LAST_DAY(date) LAST_DAY(date)
MINUTE(time) MINUTE(time)
MONTH(date) MONTH(date)
NOW() CURRENT_TIMESTAMP()
SECOND(time) SECOND(time)
TO_DATE(expr) TO_DATE(expr)
TO_TIMESTAMP(expr) TO_TIMESTAMP(expr)
WEEK(date) WEEK(date)
YEAR(date) YEAR(date)
YESTERDAY() DATEADD(day, -1, CURRENT_DATE())
TOMORROW() DATEADD(day, 1, CURRENT_DATE())
TO_START_OF_DAY(expr) DATE_TRUNC('day', expr)
TO_START_OF_HOUR(expr) DATE_TRUNC('hour', expr)
TO_START_OF_MINUTE(expr) DATE_TRUNC('minute', expr)
TO_START_OF_MONTH(expr) DATE_TRUNC('month', expr)
TO_START_OF_QUARTER(expr) DATE_TRUNC('quarter', expr)
TO_START_OF_WEEK(expr) DATE_TRUNC('week', expr)
TO_START_OF_YEAR(expr) DATE_TRUNC('year', expr)
TO_UNIX_TIMESTAMP(expr) TO_TIMESTAMP(expr)
TO_YEARMONTH(expr) TO_CHAR(expr, 'YYYY-MM')
TO_YYYYMMDD(expr) TO_CHAR(expr, 'YYYYMMDD')
TO_YYYYMMDDHH(expr) TO_CHAR(expr, 'YYYYMMDDHH')
TO_YYYYMMDDHHMMSS(expr) TO_CHAR(expr, 'YYYYMMDDHHMMSS')

Aggregate Functions

Databend Function Snowflake Function
AVG(x) AVG(x)
COUNT(x) COUNT(x)
MAX(x) MAX(x)
MIN(x) MIN(x)
SUM(x) SUM(x)

Conversion Functions

Databend Function Snowflake Function
CAST(expr AS type) CAST(expr AS type)
TO_BOOLEAN(expr) CAST(expr AS BOOLEAN)
TO_STRING(expr) CAST(expr AS STRING)
TO_INT32(expr) CAST(expr AS INT)
TO_INT64(expr) CAST(expr AS BIGINT)
TO_UINT32(expr) CAST(expr AS NUMBER)
TO_UINT64(expr) CAST(expr AS NUMBER)
TO_FLOAT32(expr) CAST(expr AS FLOAT)
TO_FLOAT64(expr) CAST(expr AS DOUBLE)
TO_DATE(expr) CAST(expr AS DATE)
TO_TIMESTAMP(expr) CAST(expr AS TIMESTAMP)
TRY_CAST(expr AS type) TRY_CAST(expr AS type)

JSON Functions

Databend Function Snowflake Function
JSON_OBJECT(key1, value1[, key2, value2[, ...]]) OBJECT_CONSTRUCT(key1, value1[, key2, value2[, ...]])
PARSE_JSON(expr) PARSE_JSON(expr)
JSON_EXTRACT_PATH_TEXT(expr, path_name) GET(expr, path_name)
JSON_PRETTY(json_string) TO_JSON_STRING(json_string)
JSON_ARRAY(value1[, value2[, ...]]) ARRAY_CONSTRUCT(value1[, value2[, ...]])
JSON_PATH_QUERY(variant, path_name) PATH(expr, path_name)
JSON_PATH_QUERY_FIRST(variant, path_name) PATH(expr, path_name)
CHECK_JSON(expr) IS_JSON(expr)
GET_PATH(variant, path_name) GET_PATH(variant, path_name)
JSON_PATH_EXISTS(json_data, json_path_expression) JSON_PATH_EXISTS(json_data, json_path_expression)
JSON_OBJECT_KEEP_NULL(key1, value1[, key2, value2[, ...]]) OBJECT_CONSTRUCT_KEEP_NULL(key1, value1[, key2, value2[, ...]])
IS_INTEGER(expr) IS_INTEGER(expr)
IS_NULL_VALUE(expr) IS_NULL_VALUE(expr)
IS_STRING(expr) IS_STRING(expr)
IS_BOOLEAN(expr) IS_BOOLEAN(expr)
FLATTEN(INPUT => expr [, PATH => expr ] [ , OUTER => TRUE | FALSE ] [ , RECURSIVE => TRUE | FALSE ] [ , MODE => 'OBJECT' | 'ARRAY' | 'BOTH' ]) FLATTEN(INPUT => expr [, PATH => expr ] [ , OUTER => TRUE | FALSE ] [ , RECURSIVE => TRUE | FALSE ] [ , MODE => 'OBJECT' | 'ARRAY' | 'BOTH' ])
IS_ARRAY(expr) IS_ARRAY(expr)
GET(variant, index) GET(variant, index)
JSON_PATH_QUERY_ARRAY(variant, path_name) PATH_ARRAY(variant, path_name)
GET_IGNORE_CASE(variant, field_name) GET_IGNORE_CASE(variant, field_name)
JSON_ARRAY_ELEMENTS(json_string) ARRAY_ELEMENTS(json_string)
JSON_STRIP_NULLS(json_string) REMOVE_NULLS(json_string)
AS_BOOLEAN(variant) CAST(variant AS BOOLEAN)
AS_INTEGER(variant) CAST(variant AS INTEGER)
AS_FLOAT(variant) CAST(variant AS FLOAT)
AS_STRING(variant) CAST(variant AS STRING)
AS_ARRAY(variant) CAST(variant AS ARRAY)
AS_OBJECT(variant) CAST(variant AS OBJECT)

Bitmap Functions

Databend Function Snowflake Function
BUILD_BITMAP(expr) N/A
TO_BITMAP(expr) N/A
BITMAP_AND(expr1, expr2) N/A
BITMAP_OR(expr1, expr2) N/A
BITMAP_XOR(expr1, expr2) N/A
BITMAP_HAS_ALL(expr1, expr2) N/A
BITMAP_HAS_ANY(expr1, expr2) N/A
BITMAP_TO_STRING(expr) N/A

Conditional Functions

Databend Function Snowflake Function
IF(cond, expr1, expr2) IFF(cond, expr1, expr2)
NULLIF(expr1, expr2) NULLIF(expr1, expr2)
IFNULL(expr1, expr2) IFNULL(expr1, expr2)
COALESCE(expr1, expr2[, ...]) COALESCE(expr1, expr2[, ...])

Encryption and Hash Functions

Databend Function Snowflake Function
MD5(expr) MD5(expr)
SHA(expr) SHA(expr)
SHA2(expr, x) SHA2(expr, x)
CRC32(expr) CRC32(expr)
XXHASH32(expr) N/A
XXHASH64(expr) N/A
BLAKE3(expr) N/A
SIPHASH64(expr) N/A
CITY64WITHSEED(expr1, expr2) N/A

Window Functions

Databend Function Snowflake Function
ROW_NUMBER() ROW_NUMBER()
RANK() RANK()
DENSE_RANK() DENSE_RANK()
PERCENT_RANK() PERCENT_RANK()
CUME_DIST() CUME_DIST()
NTILE(n) NTILE(n)
LAG(expr, n, default) LAG(expr, n, default)
LEAD(expr, n, default) LEAD(expr, n, default)
FIRST_VALUE(expr) FIRST_VALUE(expr)
LAST_VALUE(expr) LAST_VALUE(expr)
NTH_VALUE(expr, n) NTH_VALUE(expr, n)

Array Functions

Databend Function Snowflake Function
ARRAY_REDUCE(array, lambda) N/A
ARRAY_FLATTEN(array) N/A
SLICE(array, start[, end]) SLICE(array, start[, end])
ARRAY_TO_STRING(array, '<separator>') ARRAY_TO_STRING(array, '<separator>')
CONTAINS(array, element) ARRAY_CONTAINS(array, element)
ARRAY_FILTER(array, lambda) N/A
ARRAY_SORT(array[, order[, nullposition]]) ARRAY_SORT(array[, order[, nullposition]])
UNNEST(array) UNNEST(array)
ARRAY_CONCAT(array1, array2) ARRAY_CAT(array1, array2)
GET(array, index) GET(array, index)
ARRAY_INDEX_OF(array, element) ARRAY_INDEX_OF(array, element)
ARRAY_UNIQUE(array) ARRAY_UNIQUE(array)
ARRAY_REMOVE_FIRST(array) N/A
ARRAY_TRANSFORM(array, lambda) N/A
ARRAY_DISTINCT(array) ARRAY_DISTINCT(array)
ARRAY_APPEND(array, element) ARRAY_APPEND(array, element)
ARRAY_PREPEND(element, array) ARRAY_PREPEND(element, array)
ARRAY_LENGTH(array) ARRAY_LENGTH(array)
ARRAY_APPLY(array, lambda) N/A

Geospatial Functions

Databend Function Snowflake Function
H3_EXACT_EDGE_LENGTH_KM(h3) H3_EXACT_EDGE_LENGTH_KM(h3)
H3_EDGE_LENGTH_M(1) H3_EDGE_LENGTH_M(1)
H3_LINE(h3, a_h3) H3_LINE(h3, a_h3)
H3_EDGE_LENGTH_KM(res) H3_EDGE_LENGTH_KM(res)
H3_EXACT_EDGE_LENGTH_M(h3) H3_EXACT_EDGE_LENGTH_M(h3)
GEOHASH_DECODE('<geohashed-string>') GEOHASH_DECODE('<geohashed-string>')
H3_HEX_AREA_M2(res) H3_HEX_AREA_M2(res)
H3_TO_GEO_BOUNDARY(h3) H3_TO_GEO_BOUNDARY(h3)
H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(h3) H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(h3)
H3_HEX_AREA_KM2(res) H3_HEX_AREA_KM2(res)
H3_IS_VALID(h3) H3_IS_VALID(h3)
H3_TO_GEO(h3) H3_TO_GEO(h3)
H3_UNIDIRECTIONAL_EDGE_IS_VALID(h3) H3_UNIDIRECTIONAL_EDGE_IS_VALID(h3)
GEO_TO_H3(lon, lat, res) GEO_TO_H3(lon, lat, res)
GEOHASH_ENCODE(lon, lat) GEOHASH_ENCODE(lon, lat)
POINT_IN_POLYGON((x,y), [(a,b), (c,d), (e,f) ... ]) POINT_IN_POLYGON((x,y), [(a,b), (c,d), (e,f) ... ])
H3_DISTANCE(h3, a_h3) H3_DISTANCE(h3, a_h3)
H3_CELL_AREA_RADS2(h3) H3_CELL_AREA_RADS2(h3)
H3_TO_CENTER_CHILD(h3, res) H3_TO_CENTER_CHILD(h3, res)
H3_CELL_AREA_M2(h3) H3_CELL_AREA_M2(h3)
H3_GET_FACES(h3) H3_GET_FACES(h3)
H3_IS_PENTAGON(h3) H3_IS_PENTAGON(h3)
STRING_TO_H3(h3) STRING_TO_H3(h3)
H3_GET_UNIDIRECTIONAL_EDGE(h3, a_h3) H3_GET_UNIDIRECTIONAL_EDGE(h3, a_h3)
H3_GET_BASE_CELL(h3) H3_GET_BASE_CELL(h3)
H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(h3) H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(h3)
H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3)
H3_TO_PARENT(h3, parent_res) H3_TO_PARENT(h3, parent_res)
H3_NUM_HEXAGONS(res) H3_NUM_HEXAGONS(res)
H3_TO_CHILDREN(h3, child_res) H3_TO_CHILDREN(h3, child_res)
H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(h3) H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(h3)
H3_TO_STRING(h3) H3_TO_STRING(h3)
H3_K_RING(h3, k) H3_K_RING(h3, k)
H3_IS_RES_CLASS_III(h3) H3_IS_RES_CLASS_III(h3)
H3_INDEXES_ARE_NEIGHBORS(h3, a_h3) H3_INDEXES_ARE_NEIGHBORS(h3, a_h3)
H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3)
H3_GET_RESOLUTION(h3) H3_GET_RESOLUTION(h3)

Other Functions

Databend Function Snowflake Function
NEXTVAL(seq_name) NEXTVAL(seq_name)
QUOTE(str) QUOTE(str)
ZhiHanZ commented 11 months ago
Feature Snowflake Databend
Kafka Connect Yes Yes
Flink CDC No Yes
dbt Yes Yes
go driver Yes Yes
java driver Yes Yes
rust driver No Yes
python driver Yes Yes
nodejs driver Yes Planning
embedded python lib No Yes
Iceberg Private Preview Yes
PsiACE commented 11 months ago

we hava a nodejs driver, see:

In addition, although we do not yet support geospatial types, we have already implemented a wide range of geospatial functions. I believe this is worth mentioning.

I think, in Storage Format, perhaps we can also add our native format and indicate that it can bring better performance.

xudong963 commented 11 months ago

We also support aggregate and hash join spilling.

Chasen-Zhang commented 9 months ago

Summary

Databend is an open-source alternative to Snowflake, so we need a page to introduce the differences between Databend and Snowflake. Feel free to contribute if you notice any omissions.

Category Snowflake Databend Numeric Data Types Data Types Yes Yes [1, 2] String & Binary Data Types Data Types Yes Yes Logical Data Types Data Types Yes Yes Date & Time Data Types Data Types Yes Yes Geospatial Data Types Data Types Yes No(Geo functions) Bitmap Data Types Data Types No Yes Semi-structured Data Types Data Types Yes Yes Database Databases Yes Yes Table Tables Yes Yes External Table Tables Yes Yes Vacuum Tables No Yes Clustering Tables Yes Yes(Distributed) Hive Lake Yes Yes Iceberg Lake Yes(Private Preview) Yes View Views Yes Yes Materialized Views Views Yes Yes(Need Doc) User, Role, & Privilege Privileges Yes Yes Query Queries Yes Yes Update General DML Yes Yes(Distributed) Replace General DML No Yes(Distributed) Delete General DML Yes Yes(Distributed) Merge General DML Yes Yes Stage(Internal/External/Remove/List) Data Loading Yes Yes Query Stage File Data Loading Yes Yes Copy Into Table/Location Data Loading Yes Yes(Distributed) Transforming Data During a Load Data Loading Yes Yes Table Pipe Data Loading Yes Yes(Cloud) Table Task Data Loading Yes Yes Table Stream Data Loading Yes Yes User-Defined Functions Functions Yes Yes External Functions Functions Yes Yes Stored Procedure Functions Yes No AI Functions Functions No Yes Window Function Functions Yes Yes Time Travel Data Recovery Yes Yes Data Sharing Data Sharing Yes Yes RBAC Security Yes Yes Network Policies Security Yes Yes Masking Policies Security Yes Yes Kafka Connect Connectors Yes Yes Flink CDC Connectors No Yes dbt Connectors Yes Yes Drivers Drivers Go Java Ptyhon Node.js Go Java Python Node.js Rust Embedded Python Library No Yes Storage Format Storage MicroPartition Parquet Aggregate&Join Spill Performance Optimization Yes Yes Local Disk Cache Performance Optimization Yes Yes Profiling Performance Optimization Powerful Medium CLI Client Client SnowSQL BendSQL TPC-H Benchmark Yes Yes TPC-DS Benchmark Yes Yes Supported Cloud Platforms Cloud AWS/GCP/AZURE Any Cloud

@BohuTANG Could you help me change the address of the connection๏ผŒ Here are the rules๏ผš /doc/sql-reference/ => /sql/sql-reference/ /doc/sql-commands/ => /sql/sql-commands/ /doc/sql-functions/ =>/sql/sql-functions/ (๐Ÿ‘† Although I redirected all of these, it's better to change them all together. And you can directly change the domain name to https://docs.databend.com )

https://docs.databend.com/load-data/pipeline => https://docs.databend.com/doc/load-data/pipeline https://databend.rs/blog/2023-08-01-iceberg-integration=> https://www.databend.com/blog/2023-08-01-iceberg-integration/

Thank you in advance๏ผ

cdmikechen commented 8 months ago

@BohuTANG Wish to support Geospatial Data Types, we used to have a lot of spatial data needs, and if databend can support geo, it will be a great help to us ~