blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

Fix support for setting "bytes" as datatype #242

Open koenlek opened 3 years ago

koenlek commented 3 years ago

This fixes #230. This fix enables using 'bytes' type like this:

datashape.dshape('{foo: bytes}')

To verify this fix, you can run:

conda create -y -n datashape_test python=2.7 
conda activate datashape_test
conda install -y --file requirements.txt
python -c "import datashape; my_shape = datashape.dshape('{foo: bytes}'); print(my_shape)" # WORKS!
git checkout HEAD^
python -c "import datashape; my_shape = datashape.dshape('{foo: bytes}'); print(my_shape)" # fails...
git checkout fix_bytes_support
conda deactivate

# To try with another py version, just:
conda env remove -n datashape_test
conda create -y -n datashape_test python=3.7
# Then repeat the steps from above starting from the "conda install ..." line

In practice I tested a datashape with "bytes" to help me serialize and deserialize bytes data and it worked perfectly.

Note that I suspect the CI failures to be false failures. Locally all tests succeed on the py versions I tested (2.7 and 3.7). For example, this PR has the same failures: https://travis-ci.org/github/blaze/datashape/builds/705165599

koenlek commented 3 years ago

@llllllllll , @kglowinski, @skrah , you seem among the last to merge PRs to this repo. How about merging this PR, bumping version to 0.5.5, tag it, and releasing it to pypi (the last version on pypi is 0.5.2, though the last tag here is 0.5.4)?

If you're willing to support releasing it, then I might put in the time for another small fix: improving DataShapeSyntaxError to be more meaningful by printing what field it errored on, like requested here: https://github.com/blaze/datashape/issues/181

I had a look at the code and it looks like a simple improvement.