apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.3k stars 1.19k forks source link

[DISCUSS] Remove `FORMAT <..>` backwards compatibility options from COPY #9905

Closed alamb closed 7 months ago

alamb commented 7 months ago

Is your feature request related to a problem or challenge?

@tinfoil-knight added backwards compatibility to the COPY command format options in version 37.0.0 via https://github.com/apache/arrow-datafusion/pull/9744

So now this format is supported (similar to duckdb)

COPY (select * from (values (1))) to 'test_files/scratch/copy/'
OPTIONS (format json, compression gzip);

However, the newer more consistent syntax looks like

COPY (select * from (values (1))) to 'test_files/scratch/copy/'
STORED AS JSON
OPTIONS (compression gzip);

@metesynnada asked https://github.com/apache/arrow-datafusion/pull/9753#pullrequestreview-1970055037 if we should phase out the old (format json) syntax

Describe the solution you'd like

Decide if filing this ticket to discuss

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 7 months ago

cc @devinjdangelo

metesynnada commented 7 months ago

cc @ozankabak.

I'm in favour of phasing out this syntax because it's getting hard to keep up with.

Users will mostly use COPY test TO sa.parquet, which we already support. If they need to state the format, COPY test TO sa.tbl STORED AS CSV should be enough. Choosing a format directly is rarer than letting the system figure it out.

devinjdangelo commented 7 months ago

I think it is reasonable to phase out the format option in favor of the STORED AS keyword.

alamb commented 7 months ago

Great, so sounds good. I think then this is a good first issue -- basically revert https://github.com/apache/arrow-datafusion/pull/9744 and fixup any tests that are needed

tinfoil-knight commented 7 months ago

take