cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.85k stars 1.77k forks source link

Cubestore with external bucket support parquet file format: support more ingestion formats other than CSV #3051

Open rongfengliang opened 3 years ago

rongfengliang commented 3 years ago

Is your feature request related to a problem? Please describe.

current cubestore support external bucket with csv file format . can support with parquet format directly

Describe the solution you'd like we can config bucket file format. maybe we can directly write to cubestore parquet format storage ?

paveltiunov commented 3 years ago

Hey @rongfengliang ! Thanks for posting it! Yep. We're considering having more different formats other than CSV to speed up ingestion. Could you please elaborate on your use case? What database do you use and why do you think you'll benefit from using parquet instead of CSV?

rongfengliang commented 3 years ago

Hey @rongfengliang ! Thanks for posting it! Yep. We're considering having more different formats other than CSV to speed up ingestion. Could you please elaborate on your use case? What database do you use and why do you think you'll benefit from using parquet instead of CSV?

we use parquet file as our data lake format so we want cubestore externel bucket can read parquet file directly

paveltiunov commented 3 years ago

@rongfengliang What query engine do you use for your data lake?

rongfengliang commented 3 years ago

@rongfengliang What query engine do you use for your data lake?

dremio

paveltiunov commented 3 years ago

@rongfengliang Do you know if Dremio supports parquet export? Or do you mean you want Cube Store to access raw parquet files directly from the bucket?

rongfengliang commented 3 years ago

@rongfengliang Do you know if Dremio supports parquet export? Or do you mean you want Cube Store to access raw parquet files directly from the bucket?

dremio can using create table into external storge (like s3. with parquet format) maybe cube store can load this file

github-actions[bot] commented 2 years ago

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

rascasse83 commented 2 years ago

There is quite a lot of overlap functionality wise between Dremio and Cube, as both act as a semantic layer decoupling the presentation layer from the data storage engine, and enforcing a security model too, at run-time. Therefore for Cube.js to be considered a possible alternative to Dremio, it would need the ability to connect to data lakes. Failing that, one could potentially implement a 2-tier semantic layer with Layer 1 being Dremio, federating both RDBMS data and Data Lake files, and Layer 2 being Cube on top of that, but i am not sure whether that's feasible security wise, functionality wise and even practical, as it seems the data would have to go through a lot of hoops and loops to get to the presentation dashboard layer.

rongfengliang commented 2 years ago

@rascasse83 Yes, but cube.js is mainly for bi,dremio for data lake query & semantic layer. we can use cube.js dremio driver connect dremio and do some dashboard