dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
228 stars 101 forks source link

Final location uses backslash in windows os #101

Closed xie-sheng closed 1 year ago

xie-sheng commented 1 year ago

Hi,

After the model generated by DBT Athena,the table location was looks like this

s3://my-test-bucket/dbt/resident_communication\ticket_last_comment\6975c131-96c7-4b48-baf3-3b18acb6a431

and i think it should be like this

s3://my-test-bucket/dbt/resident_communication/ticket_last_comment/6975c131-96c7-4b48-baf3-3b18acb6a431

I think this is cuased by Windows OS path behavior, below is my profile. s3_staging_dir: s3://my-test-bucket/dbt/ s3_data_dir: s3://my-test-bucket/dbt/ s3_data_naming: schema_table_unique

Is there a wrong setting with me? thank you very much.

xie-sheng commented 1 year ago

win10 OS

nicor88 commented 1 year ago

this behavior is expected, the s3_data_naming is set to schema_table_unique, that lead to schema_name/table_name/unique_id/ you can consider to use another s3_data_naming, for example table or schema_table, but bare in mind that only unique table locations are supported for table_type=Iceberg.

xie-sheng commented 1 year ago

Thanks for your reply.

the s3_data_naming is OK for me.

but i was wondered the forward slash and back slash within the path,when i use windows OS, the generated location path as below. s3://my-test-bucket/dbt/resident_communication\ticket_last_comment\6975c131-96c7-4b48-baf3-3b18acb6a431

and this path was strange for S3 object key and invalid for lakeformation

when i use Mac,the path is normal and everything is OK s3://my-test-bucket/dbt/resident_communication/ticket_last_comment/6975c131-96c7-4b48-baf3-3b18acb6a431

i think the deffrent OS should be the same behavior.thank you very much

nicor88 commented 1 year ago

ah I missed the backslash, the issue is here: https://github.com/dbt-athena/dbt-athena/blob/main/dbt/adapters/athena/impl.py#L81-L87 I think that windows don't return a good final path due to os.path.join

We could consider to add replace('\', '/) at the end to fix this. I want to keep the path.join to avoid dealing with final trailing slash in paths.

mattiamatrix commented 1 year ago

ah I missed the backslash, the issue is here: https://github.com/dbt-athena/dbt-athena/blob/main/dbt/adapters/athena/impl.py#L81-L87 I think that windows don't return a good final path due to os.path.join

We could consider to add replace('\', '/) at the end to fix this. I want to keep the path.join to avoid dealing with final trailing slash in paths.

Looking at the docs, importing posixpath could be a clean solution in this case.

nicor88 commented 1 year ago

@mattiamatrix thanks, I was not aware of it. Doing some testing I was to achieve what the current setup does: posixpath.join('s3://my_bucket','b') -> s3://my_bucket/b posixpath.join('s3://my_bucket/','b') -> s3://my_bucket/b

Should be indeed an easy and more elegant fix.