apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.56k stars 1.4k forks source link

parquet-cli rewrite option #2981

Open MyDELearnings opened 1 month ago

MyDELearnings commented 1 month ago

Describe the usage question you have. Please include as many useful details as possible.

Hi ,

is it possible to read directly from a gcs bucket to prune a column like rewrite -i gs:/sourcebbucket/part-00549.parquet -o gs://targetbucket/newdata/dd --prune-columns col4

i am getting error java.lang.RuntimeException: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"

Component(s)

No response

wgtmac commented 1 month ago

I don't think we can directly use parquet-cli to rewrite files from cloud object store. You may either download them to rewrite locally, or use the ParquetWriter API to set the file system configuration programatically.