frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
696 stars 146 forks source link

Support zipped data package stored in memory if possible #711

Open henhuy opened 3 years ago

henhuy commented 3 years ago

Overview

Hi there, I'm trying to implement an API where datapackages can be uploaded. Internally, thoses datapackages shall be validated and if okay uploaded to extra DB. Right now, I tried to implement upload as zipfile with additional datapacke.json (with information how to extract data from the zip) - but frictionless seems to not yet support zip datapackages, right? A workaround I have in mind, is to upload a zipfile, containing multiple CSVs and the datapackage.json, which I want to unzip into stringbuffer/filebuffer and validate/extract datapackage via frictionless from this point. Is this possible? Or do I actually have to unzip everything into real folder and start frictionless process from there, like nromally? Thanks for any hints in advance! (BTW - nice framework! thx)


Please preserve this line to notify @roll (lead of this repository)

roll commented 3 years ago

Hi @henhuy,

The implementation might have some edge case problems as it's not fully stable internally but you should be able to read/write to ZIP:

https://github.com/frictionlessdata/frictionless-py/blob/master/tests/test_package.py#L767-L952

henhuy commented 3 years ago

Hey thanks! It's working if I save file first!! But I cannot use uploaded file directly? (I am using FastAPI which uses SpooledTemporaryFile internally) Are there any plans to support StringIO or BytesIO aka filebuffers as package sources? Thanks anyway - workaround is already in place!

roll commented 3 years ago

Hi @henhuy,

Thanks. I've changed it to be a feature request