cloudnativecube / octopus

14 stars 2 forks source link

snowflake与databricks对比 #117

Open awplxz opened 3 years ago

awplxz commented 3 years ago
awplxz commented 3 years ago

snowflake用户权限调研:https://docs.snowflake.com/en/user-guide/security-access-control-overview.html

mdianjun commented 3 years ago

clickhouse如何支持半结构化数据?

使用Snowflake,你可以上传和保存结构化和半结构化文件,而无需使用ETL工具在将数据加载到EDW之前先组织数据。上传后,Snowflake会自动将数据转换为内部结构化格式。然而,与数据湖不同的是,Snowflake需要在加载和使用非结构化数据之前为其添加结构。 Snowflake可以将JSON、Avro、ORC、Parquet或XML等半结构化数据格式加载到单个字段中。为了提高性能和效率,Snowflake在内部将这些类型存储在文档的高效压缩列二进制表示中。提供了简单的查询扩展,以支持在SQL中查询这些半结构化格式,同时保持它们的原生格式。我们在Datalytyx的团队发现这个功能非常令人印象深刻。非结构化数据的查询API非常直观,允许我们以以前无法达到的速度和规模从这类数据中解析信息。

mdianjun commented 3 years ago

snowflake_databrick资料收集整理-210805.pdf

mdianjun commented 3 years ago

OnlineZTS_LabGuide.pdf lab_scripts_OnlineZTS.sql.txt

重点内容记录:

Multi-cluster is ideal for concurrency scenarios, such as many business analysts simultaneously running different queries using the same warehouse. In this scenario, the various queries can be allocated across the multiple clusters to ensure they run fast.

Many of the warehouse/compute capabilities we just covered, like being able to create, scale up and out, and auto-suspend/resume warehouses are things that are simple in Snowflake and can be done in seconds.

Snowflake has a result cache that holds the results of every query executed in the past 24 hours. These are available across warehouses, so query results returned to one user are available to any other user on the system who executes the same query, provided the underlying data has not changed. Not only do these repeated queries return extremely fast, but they also use no compute credits.

Clone a Table Snowflake allows you to create clones, also known as “zero-copy clones” of tables, schemas, and databases in seconds. A snapshot of data present in the source object is taken when the clone is created, and is made available to the cloned object. The cloned object is writable, and is independent of the clone source. That is, changes made to either the source object or the clone object are not part of the other. A popular use case for zero-copy cloning is to clone a production environment for use by Development & Testing to do testing and experimentation on without (1) adversely impacting the production environment and (2) eliminating the need to set up and manage two separate environments for production and Development & Testing. Zero-Copy Cloning FTW! A massive benefit is that the underlying data is not copied; just the metadata/pointers to the underlying data change. Hence “zero-copy” and storage requirements are not doubled when data is cloned. Most data warehouses cannot do this; for Snowflake it is easy!

Time Travel Snowflake’s Time Travel capability enables accessing historical data at any point within a pre-configurable period of time. The default period of time is 24 hours and with Snowflake Enterprise Edition it can be up to 90 days. Most data warehouses cannot offer this functionality; with Snowflake it is easy! Some useful applications of this include: ● Restoring data-related objects (tables, schemas, and databases) that may have been accidentally or intentionally deleted ● Duplicating and backing up data from key points in the past ● Analysing data usage/manipulation over specified periods of time

问题: