frictionlessdata / tabulator-py

Python library for reading and writing tabular data via streams.
https://frictionlessdata.io
MIT License
235 stars 42 forks source link

Bug introduced to s3 paths with spaces? #342

Closed cschloer closed 3 years ago

cschloer commented 3 years ago

Overview

It seems like you made a commit (https://github.com/frictionlessdata/tabulator-py/commit/e97ec9f4cd688bab89f848e4d8ddcd43e81c9659) that was supposed to fix an issue with s3 paths and spaces. Instead this seems to have introduced a bug that makes s3 paths with spaces NOT load properly. Maybe it's interacting with some custom code of mine that is preprocessing those load paths, but I am unable to find it. The error I'm getting is "Failed to find the file s3://path/to/file with spaces.csv in s3".

I can try to get reproduce it with code when I get a chance, but for me the issue happens whenever an s3 path is loaded that has a space in it.


Please preserve this line to notify @roll (lead of this repository)

cschloer commented 3 years ago

OK I'm just now noticing that you made this change in response to my own issue https://github.com/frictionlessdata/frictionless-py/issues/501 :)

cschloer commented 3 years ago

Sorry that it take me a while to get to this, but it seems like this fix you made for goodtables created issues in datapackage-pipelines/dataflows for s3 files with spaces.

roll commented 3 years ago

Thanks @cschloer,

I'll investigate. I think the fix was correct (Frictionless has a test for it - https://github.com/frictionlessdata/frictionless-py/blob/master/tests/plugins/test_aws.py#L86) but it might have been hacked in dataflows somehow.

BTW is it for all paths with spaces or e.g. only Unicode etc?

cschloer commented 3 years ago

Hey, it turns out this is on my end. I was calling requote_uri somewhere in my code to change the spaces to %20