Open karanlyons opened 7 years ago
Thanks for the report! I don't think it's a personal issue, since it's Django that is creating the query. What Django version are you using?
Django 1.10, though I can likely test this in some other versions as well.
I’m fairly confident that just having _combine_cmd_params
encode the strings in all versions of Python will do the trick, but I haven’t had a chance to sit down and debug it yet.
Okay, the string causing issues is 'Club La Unión'
. Since this isn’t in Py3 the strings don’t get encoded. Removing the if six.PY3
checks is enough to fix the problem.
I think what we’re trying to do here is encode the string if it’s unicode, right? If so, just checking against six.text_type
is enough for both versions of Python, since in 2.7 that’s unicode, and str in 3.
So if that all sounds right, I think a two line pull request should fix our problems :smile:
@karanlyons a PR that fixes an issue for you is more then welcome! Even better if it comes with tests, but if not I'll add them. I'm not sure yet which six.PY3
check you mean, but I haven't debugged the issue yet.
Sure, give me a little bit to write the tests and I’ll submit a PR we can look at.
Alright, worryingly I’m having a hard time reproducing this in a test. The way this query is generated in my Django codebase is...questionable, but the error also happens with Django’s bulk_create
. Still, I can’t get it to throw an error in either CPython or PyPy once I start reducing it down.
I’m not yet convinced that this isn’t actually my fault.
Okay, so the issue turned out to be completely different than I thought:
Django has a DateTimeRangeField
in django.contrib.postgres.fields
but it doesn’t support setting custom bounds. So I had a subclass that allowed me to set bounds for a field and have it instantiate DateTimeTZRange
with those bounds. Basically, this:
from __future__ import unicode_literals
from psycopg2.extras import DateTimeTZRange
from django.contrib.postgres.fields import DateTimeRangeField
class DateTimeRangeFieldWithBounds(DateTimeRangeField):
range_type = DateTimeTZRange
def __init__(self, *args, **kwargs):
self.default_bounds = kwargs.pop('default_bounds', '[)')
super(DateTimeRangeFieldWithBounds, self).__init__(*args, **kwargs)
...
unicode_literals
means that that the bounds string is actually unicode, which meant (for reasons I haven’t sussed out yet) that the final tstzrange
string that made its way into _combine_cmd_params
was unicode
instead of bytes
/str
.
Making the bounds kwarg bytes
instead of unicode
fixes everything in PyPy. In CPython there’s no problem with either.
And...I’m not sure where this leaves us. There definitely is a difference between what happens on CPython with psycopg2 and PyPy with psycopg2cffi, but I’m not totally certain where in that large variable surface the problem actually lies.
Thanks for digging into it @karanlyons ! I think I finally understand the issue and what six.PY3
check you were talking about. Looks like in some cases https://github.com/chtd/psycopg2cffi/blob/master/psycopg2cffi/_impl/adapters.py#L288 can return unicode under python 2 as well. I need to check with psycopg2 - if conversion to unicode should always happen in the adapters, I'll fix it there, and if not - then we can just remove six.PY3
check as you suggested.
Yeah, I think just removing the six.PY3
checks in _combine_cmd_params
should be fine, but we may also want to make sure the return types for adapters have parity between psycopg2 and psycopg2cffi for good measure (especially in case someone’s using these “internal” functions as opposed to just working through the higher level API).
@karanlyons hm, I just checked how _combine_cmd_params
works with unicode both in CPython 2.7 and in PyPy, and I could not reproduce it yet. This is what I tried:
In [1]: from psycopg2cffi import connect
In [2]: c = connect('host=localhost')
In [3]: from psycopg2cffi._impl.cursor import _combine_cmd_params
In [4]: _combine_cmd_params('select %(foo)s from x', {'foo': u'ё'}, c)
Out[4]: "select '\xd1\x91' from x"
Can you check the arguments (and their types) that cause _combine_cmd_params(query, parameters, conn)
to fail?
import psycopg2cffi
from psycopg2cffi.extras import DateTimeTZRange
from psycopg2cffi._impl.cursor import _combine_cmd_params
from datetime import datetime
conn = psycopg2cffi.connect("dbname='psycopg2_test' user='postgres' host='localhost'")
ascii_string = 'Test'
unicode_string = u'T\xe9st'
ascii_range = DateTimeTZRange(datetime.now(), datetime.now(), '[]')
unicode_range = DateTimeTZRange(datetime.now(), datetime.now(), u'[]')
# Works
_combine_cmd_params('%s', [ascii_string], conn)
_combine_cmd_params('%s', [unicode_string], conn)
_combine_cmd_params('%s', [ascii_range], conn)
_combine_cmd_params('%s', [unicode_range], conn)
_combine_cmd_params('%s %s', [ascii_string, ascii_range], conn)
_combine_cmd_params('%s %s', [unicode_string, ascii_range], conn)
_combine_cmd_params('%s %s', [ascii_range, unicode_range], conn)
# Fails
_combine_cmd_params('%s %s', [unicode_string, unicode_range], conn)
Please tell me that last one fails for you too. This bug is either more complex than I thought or I’m insane, and I’m really hoping for the former.
Yes, that does fail for me, thanks a lot for building a reproducer! Technically this is a bug in how Range objects are adapted, but the actual issue is deeper - b
helper defined here https://github.com/chtd/psycopg2cffi/blob/master/psycopg2cffi/extensions.py#L69 is not what you would expect given it's name :) Ideally I should get rid of it - luckily it's used only in _range
and in tests, so it should not touch too much code.
This might be a personal issue, but it seems like there may still be some lingering encoding issues with this package. I see that there’s some work done in this function from #57, but it’s only under python 3.
I’m getting this error in 2.7 with
unicode_literals
.