Closed zelima closed 7 years ago
We have a problem with zipped json file - when opening it is complaining about encoding. Problem is in zipping from remote URL and think this problem should be fixed directly in dpp I debugged it at a certain level - Here is the thing:
pipeline-spec.yaml
s that loa_resources
and dump.to_zip
- with only one difference - URL to datapackage.json (one is local and second one from above)
remote:
pipeline:
-
run: load_resource
parameters:
resource: 'vix-daily_json'
url: "https://pkgstore-testing.datahub.io/core/finance-vix:vix-daily_json/datapackage.json" <<--- this is changed in other one
stream: False
-
run: dump.to_zip
parameters:
out-file: remote.zip
force-format: False
handle-non-tabular: True
Debugging further I took a look at how they actually are zipped - https://github.com/frictionlessdata/datapackage-pipelines/blob/master/datapackage_pipelines/lib/dump/dumper_base.py#L192
tmp = tempfile.NamedTemporaryFile(delete=False)
stream = requests.get(url, stream=True).raw
shutil.copyfileobj(stream, tmp)
If you try and simply read the tmp with open(tmp.name).read()
here you will get UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
As I read byte 0x8b in position 1
usually signals that the data stream is gzipped, but if you try requests.get(url, stream=True).encoding
it instead results with ISO-8859-1
I hardly believe the reason is there, but was not able to find an easy way to fix that by myself
Closing as FIXED. The problem with zipping remote non-tabular resources will be fixed with this issue in dpp https://github.com/frictionlessdata/datapackage-pipelines/issues/89
Feedback after reviewing export to zip.
Acceptance Criteria
Tasks
out-file
ot dataset-name.zip instead of datahub.zip [0.5]