exasol / cloud-storage-extension

Exasol Cloud Storage Extension for accessing formatted data Avro, Orc and Parquet, on public cloud storage systems
MIT License
7 stars 11 forks source link

Infer schema from parquet files #293

Closed tm-henningnt closed 2 months ago

tm-henningnt commented 6 months ago

It would be very usefull to automatically infer a valid table ddl based on an input parquet (and other) files, both as an automatic capabilitity (i.e. automatically drop/create tables when importing a file), and as a separate capability (i.e. generate a propopsed target table based on an input file).

ThomasBestfleisch commented 2 months ago

@tm-henningnt You can achieve this functionality with Exasol file based virtual schemas where we support auto inference for both CSV & Parquet files. See https://github.com/exasol/virtual-schema-common-document/blob/main/doc/user_guide/edml_user_guide.md#automatic-mapping-inference