Open rongfengliang opened 3 years ago
Hey @rongfengliang ! Thanks for posting it! Yep. We're considering having more different formats other than CSV to speed up ingestion. Could you please elaborate on your use case? What database do you use and why do you think you'll benefit from using parquet instead of CSV?
Hey @rongfengliang ! Thanks for posting it! Yep. We're considering having more different formats other than CSV to speed up ingestion. Could you please elaborate on your use case? What database do you use and why do you think you'll benefit from using parquet instead of CSV?
we use parquet file as our data lake format so we want cubestore externel bucket can read parquet file directly
@rongfengliang What query engine do you use for your data lake?
@rongfengliang What query engine do you use for your data lake?
dremio
@rongfengliang Do you know if Dremio supports parquet export? Or do you mean you want Cube Store to access raw parquet files directly from the bucket?
@rongfengliang Do you know if Dremio supports parquet export? Or do you mean you want Cube Store to access raw parquet files directly from the bucket?
dremio can using create table into external storge (like s3. with parquet format) maybe cube store can load this file
If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.
There is quite a lot of overlap functionality wise between Dremio and Cube, as both act as a semantic layer decoupling the presentation layer from the data storage engine, and enforcing a security model too, at run-time. Therefore for Cube.js to be considered a possible alternative to Dremio, it would need the ability to connect to data lakes. Failing that, one could potentially implement a 2-tier semantic layer with Layer 1 being Dremio, federating both RDBMS data and Data Lake files, and Layer 2 being Cube on top of that, but i am not sure whether that's feasible security wise, functionality wise and even practical, as it seems the data would have to go through a lot of hoops and loops to get to the presentation dashboard layer.
@rascasse83 Yes, but cube.js is mainly for bi,dremio for data lake query & semantic layer. we can use cube.js dremio driver connect dremio and do some dashboard
Is your feature request related to a problem? Please describe.
current cubestore support external bucket with csv file format . can support with parquet format directly
Describe the solution you'd like we can config bucket file format. maybe we can directly write to cubestore parquet format storage ?