StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.65k stars 1.75k forks source link

Compressed CSV is not working for loading data #49891

Open roshanlabh opened 3 weeks ago

roshanlabh commented 3 weeks ago

Steps to reproduce the behavior (Required)

Attempt to query a compressed CSV file in Google Cloud Storage using SELECT * FROM FILES function in StarRocks:

SELECT * FROM FILES (
  "path" = "gs://ctap_auto/1723627416-1723627416-Push-Impressions-20240814-0-6.csv.gz",
  "format" = "csv",
  "compression" = "gzip",
  "csv.column_separator"=",",
  "csv.row_delimiter"="\n",
  "csv.enclose"='"',
  "csv.skip_header"="1",
  "gcp.gcs.service_account_email" = "...",
  "gcp.gcs.service_account_private_key_id" = "...",
  "gcp.gcs.service_account_private_key" = "..."
)
LIMIT 100;

Expected behavior (Required)

The query should successfully read and return data from the compressed CSV file stored in Google Cloud Storage.

Real behavior (Required)

The query fails with the following error message:

Access storage error. Error message: not supported format: csv

StarRocks version (Required)

chenminghua8 commented 3 weeks ago

Limits File external tables must be created in databases within the default_catalog. You can run SHOW CATALOGS to query catalogs created in the cluster. Only Parquet, ORC, Avro, RCFile, and SequenceFile data files are supported. You can only use file external tables to query data in the target data file. Data write operations such as INSERT, DELETE, and DROP are not supported.