datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
465 stars 87 forks source link

Import: No support of AWS Athena (Trino) DDLs #332

Closed roykoand closed 3 weeks ago

roykoand commented 3 months ago

It's not an issue of this project but of the underlying dependency - simple_ddl_parser (https://github.com/xnuinside/simple-ddl-parser)

It does not have support of DDLs generated by AWS Athena (SHOW CREATE TABLE).

Using this DDL as an example:

CREATE EXTERNAL TABLE `database`.`table` (
    column1 string,
    column2 string
)
PARTITIONED BY
(
    column3 integer
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://somewhere-in-s3/prefix1'
TBLPROPERTIES (
  'parquet.compression'='GZIP'
)
$ datacontract import --format sql --source aws_athena_ddl.sql
...
DDLParserError: Unknown symbol "'"

If you delete everything except columns definitions, it's still providing an invalid output:

CREATE EXTERNAL TABLE `database`.`table` (
    column1 string,
    column2 string
)
PARTITIONED BY
(
    column3 integer
)
$ datacontract import --format sql --source aws_athena_ddl.sql
dataContractSpecification: 0.9.3
id: my-data-contract-id
info:
  title: My Data Contract
  version: 0.0.1
models:
  '`table`':
    type: table
    fields:
      column1:
        type: string
      column2:
        type: string
jochenchrist commented 3 months ago

Thanks for reporting. I think best way is to open n issue (and maybe even PR) at simple_ddl_parser

roykoand could you do so?

roykoand commented 3 months ago

@jochenchrist Sure! Just created a feature request in their repo: https://github.com/xnuinside/simple-ddl-parser/issues/272

xnuinside commented 2 months ago

fyi: was fixed in version 1.6.0 in simple-ddl-parser

jochenchrist commented 2 months ago

Merged #372

@roykoand Could you test with the current main version, if this solves your issue?