MonetDB / pymonetdb

The Python API for MonetDB
https://www.monetdb.org/
Mozilla Public License 2.0
28 stars 20 forks source link

Properly hex and de-hex binary data for both py2 and py3 #62

Closed gijzelaerr closed 4 years ago

coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.1%) to 77.351% when pulling ece05dc694050d2c06d8283b601bea741e02fcad on njnes-master into 5d61b2704b1441fe6846cde1df2d6ba312b921bb on master.

coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.1%) to 77.351% when pulling ece05dc694050d2c06d8283b601bea741e02fcad on njnes-master into 5d61b2704b1441fe6846cde1df2d6ba312b921bb on master.

njnes commented 4 years ago

This is what buildbot uses, ie creates a binary value inserts and recreates. Thats what I expect they tested on at least mysql, postgres and sqlite. Worked with my earlier change. But not with the new code, gives builtins.TypeError: string argument without an encoding and File "/usr/lib64/python3.7/site-packages/sqlalchemy/sql/sqltypes.py", line 944, in process value = bytes(value) builtins.TypeError: string argument without an encoding

import sqlalchemy as sa

from buildbot.util import sautils

def test_unicode(migrate_engine): """Test that the database can handle inserting and selecting Unicode"""

set up a subsidiary MetaData object to hold this temporary table

submeta = sa.MetaData()
submeta.bind = migrate_engine

test_unicode = sautils.Table(
    'test_unicode', submeta,
    sa.Column('u', sa.Unicode(length=100)),
    sa.Column('b', sa.LargeBinary),
)
test_unicode.create()

# insert a unicode value in there
u = "Frosty the \N{SNOWMAN}"
b = b'\xff\xff\x00'
ins = test_unicode.insert().values(u=u, b=b)
migrate_engine.execute(ins)

# see if the data is intact
row = migrate_engine.execute(sa.select([test_unicode])).fetchall()[0]
assert isinstance(row['u'], str)
assert row['u'] == u
assert isinstance(row['b'], bytes)
assert row['b'] == b

# drop the test table
test_unicode.drop()
gijzelaerr commented 4 years ago

I'm not really sure what to do with this copy paste. This (unicode and binary data) is also quite extensively tested in the current test suite. What is actually broken?

kutsurak commented 4 years ago

Hi @gijzelaerr. I'm taking a look at this and I will come back tomorrow with a more detailed reply, but see my first thoughts below.

I suspect that the problem is the following: In Python 2 (as in C) in order to handle a number of raw bytes you would use an array of chars. In Python 3 this has changed and we need to use the byte and bytearray types.

In principle, if I have a Python 3 byte buffer I should be able to insert it into MonetDB as a BLOB and get it back as a byte buffer again. I believe that this is where buildbot fails when we try to use it with MonetDB as a backend.

Again, I will take a more detailed look and come back with a more informed opinion (and hopefully some code) tomorrow.

Panos.

gijzelaerr commented 4 years ago

ok perfect @kutsurak that would be helpful. Making it properly work with both python2 and python3 is probably painful, so maybe it is just better to make a 1.3.0 release and implement this for a 2.0.0 for which I intend to drop python2 support. Additionally, the python DB API is not really specific about how to deal with bytes data, since it still dates from the py2 era (where strings and bytes were the same thing).

kutsurak commented 4 years ago

Hi @gijzelaerr. Sorry for me being silent on this. It has been a bit busy the past few days. I hope to spend some time on this issue this week.

One question: do you have a rough schedule for when the 1.3.0 and the 2.0.0 versions are going to be released?

gijzelaerr commented 4 years ago

Hi @kutsurak The release schedule is not time-based but issue-based. For now, it looks like this, but I'm open for requests and/or suggestions:

https://github.com/gijzelaerr/pymonetdb/milestones

I think it starts to get time for a new release, so we can also do a release now and push out a 1.4 for the remaining issues.

kutsurak commented 4 years ago

Hi @gijzelaerr

This is still incomplete but I've made some progress. Take a look at the code and the test results.

I have some ideas for how to fix the failure in Python 2.7 and pypy.

The failure in Python 3.4 would require a different implementation. If we need to keep it I will try to find a way to change the implementation, but this version has reached End Of Life status and I would propose that we drop it.

The failure in Python 3.7 I think is unrelated to my changes.

There are also two tests in the test_unicode.py file that cause hanging, so they are skipped in these results. I haven't had time to dive into this problem but I will next week.

Panos.

gijzelaerr commented 4 years ago

replaced with https://github.com/gijzelaerr/pymonetdb/pull/67