djangonauts / django-hstore

PostgreSQL HStore support for Django.
http://django-hstore.readthedocs.io/
Other
517 stars 142 forks source link

Support for a serialized field (preserve data types) #80

Closed alukach closed 9 years ago

alukach commented 10 years ago

Add new hstore field type, SerializedDictionaryField. This field accepts optional serializer and deserializer arguments (defaulting to json.dumps and json.loads, respectively). The field returns a dict object (not HStoreDict), and runs all data through the serializer before insertions into the DB (along within runing all data through the deserializer when retrieving from the DB). This allows data placed into the field to maintain their type (provided that the data is compatible with the serializer and that the serializer only outputs str data). Additionally, by defining their own serializer functions, a user can easily put in custom logic to handle non-standard datatypes (for instance, if one was interested in serializing a custom class to a specific format).

The implementation is pretty straightforward:

# models.py
class SerializedExample(models.Model):
    name = models.CharField(max_length=32)
    data = hstore.SerializedDictionaryField()
    objects = hstore.HStoreManager()

>>> obj = SerializedExample.objects.create(
...   name="A Serializable Field!", 
...   data={
...     'str': 'A string',
...     'int': 1234,
...     'float': 3.141,
...     'bool': True,
...     'list': [0, 'one', [2.0, 2.1]],
...     'dict': {
...       'a': 1,
...       'b': 'two',
...       'c': ['three']
...     }
...   }
... )

>>> obj.data
{'int': 1234, 'float': 3.141, 'list': [0, 'one', [2.0, 2.1]], 'bool': True, 'str': 'A string', 'dict': {'a': 1, 'c': ['three'], 'b': 'two'}

This feature made other features not possible for the SerializedDictionaryField, such as subsetting against a list of values:

# Expression will be interpreted as subsetting by key/value mapping, not subsetting against a list of values.
Something.objects.filter(data__contains={'a': ['1', '2']})

Other non-supported features: maintaining type on Decimal or long.

I do believe that this optional functionality could be rolled into the standard HStoreField, but for the sake of clarity it was kept in it's own field. Also for the sake of clarity, unsupported features were left in the TestSerializedDictionaryField class as commented-out text (this should be removed before the pull request is approved).

This pull request is in response to this thread on the django-hstore group.

alukach commented 10 years ago

It should be said that this was not tested against a PostGIS enabled database locally. It looks like tests only pass for the exact version of Django (1.7) and Python (2.7) that I used. I relied on get_db_prep_lookup in efforts to access the prepared state (consequently making it only compatible with Django 1.7+), it may be possible to work around this.

coveralls commented 10 years ago

Coverage Status

Coverage decreased (-9.31%) when pulling 26b070046b0aa371f239f6073890e65ac875ebeb on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

alukach commented 10 years ago

This single failing test appears to be due to the fact that Django 1.5 and 1.5 derive the SerializedDataBag.objects.filter(data__isnull={"v": False}) query SQL from class HStoreWhereNode whilst Django 1.7 derives the query from class HStoreIsNull.

Looking deeper...

EDIT: It appears that the issue was that the True or False values were being serialized to 'true' and 'false', both of which have a boolean value of True in Python. Preventing serialization on the isnull lookup solved the problem. I don't believe that case would apply to any other queries outside of isnull, but it's a possibility.

coveralls commented 10 years ago

Coverage Status

Coverage increased (+0.03%) when pulling 08a0367a52a0c769a1e5b8ad0be33dbc66a7e61f on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

coveralls commented 10 years ago

Coverage Status

Coverage increased (+0.03%) when pulling 6634a1d7e4049684a6a7135514db4c1950a40ccf on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

alukach commented 10 years ago

There are still some issues around filtering. It appears that the query arguments are deserialized before they are turned into SQL, requiring the user to enter serialized versions of the arguments (which is obviously not an ideal user experience). For example:

>>> obj = SerializedDataBag.objects.create(
...   name="A Serializable Field!", 
...   data={'int': 1234}
... )

# Incorrectly filter for a string of '1234'
>>> qs = SerializedDataBag.objects.filter(data__contains='1234')
>>> qs  # It works...
[<SerializedDataBag: 1>]
>>> print qs.query  # Actual query shows that it's looking for an int 1234
SELECT "django_hstore_tests_serializeddatabag"."id", "django_hstore_tests_serializeddatabag"."name", "django_hstore_tests_serializeddatabag"."data" FROM "django_hstore_tests_serializeddatabag" WHERE "django_hstore_tests_serializeddatabag"."data"::text LIKE %1234%

# When trying the correct method, searching for the actual int
>>> qs = SerializedDataBag.objects.filter(data__contains=1234)
# Bad things happen due to the fact that it tries to deserialize an int.
>>> qs
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/query.py", line 116, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/query.py", line 141, in __iter__
    self._fetch_all()
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/query.py", line 966, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/query.py", line 265, in iterator
    for row in compiler.results_iter():
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 700, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 775, in execute_sql
    sql, params = self.as_sql()
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 109, in as_sql
    where, w_params = self.compile(self.query.where)
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 80, in compile
    return node.as_sql(self, self.connection)
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/where.py", line 106, in as_sql
    sql, params = qn.compile(child)
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 78, in compile
    return vendor_impl(self, self.connection)
  File "/Users/anthony/.virtualenvs/django-hstore/src/django-hstore/django_hstore/lookups.py", line 117, in as_postgresql
    raise ValueError('invalid value')
ValueError: invalid value
>>> print qs.query
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/query.py", line 196, in __str__
    sql, params = self.sql_with_params()
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/query.py", line 204, in sql_with_params
    return self.get_compiler(DEFAULT_DB_ALIAS).as_sql()
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 109, in as_sql
    where, w_params = self.compile(self.query.where)
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 80, in compile
    return node.as_sql(self, self.connection)
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/where.py", line 106, in as_sql
    sql, params = qn.compile(child)
  File "/Users/anthony/.virtualenvs/hstore-d1.7/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 78, in compile
    return vendor_impl(self, self.connection)
  File "/Users/anthony/.virtualenvs/django-hstore/src/django-hstore/django_hstore/lookups.py", line 117, in as_postgresql
    raise ValueError('invalid value')
ValueError: invalid value
coveralls commented 10 years ago

Coverage Status

Coverage decreased (-0.21%) when pulling 4027eba21063099c72ed6de02ef3d5c79e819ca5 on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

alukach commented 10 years ago

This last commit fixes contains lookups for list objects, such as:

# classic text lookup (look up for occurence of string in all the keys)
Something.objects.filter(data__contains='crazy')
Something.objects.filter(data__contains=[1])
Something.objects.filter(data__contains={'a': 2})

This breaks key querying, such as:

# subset by list of keys
Something.objects.filter(data__contains=['a', 'b'])

# subset by single key
Something.objects.filter(data__contains=['a'])

Additionally, bad things happen if you try to query for list objects with lengths greater than 1 and multiple ints, such as:

# Multiple ints
SerializedDataBag.objects.filter(data__contains=[3, 4])
...
ProgrammingError: operator does not exist: hstore ?& integer[]
LINE 1: ...remodel" WHERE "django_hstore_tests_serializeddatabag"."data"::hstore ?& ARRAY[3...
                                                             ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
# SELECT "django_hstore_tests_serializeddatabag"."id", "django_hstore_tests_serializeddatabag"."data" FROM "django_hstore_tests_serializeddatabag" WHERE "django_hstore_tests_serializeddatabag"."data"::hstore ?& ARRAY[3, 4] LIMIT 21

SerializedDataBag.objects.filter(data__contains=['1', 'b', 3])

While this could probably be fixed without much trouble, I question whether this is the right course to take. Maybe it's better to maintain key querying and omit support for containment lookups using list data types. Input would be appreciated.

EDIT: This comment is no longer relevant. Commit was reverted.

coveralls commented 10 years ago

Coverage Status

Coverage decreased (-0.32%) when pulling 3362b4861d6c48aa337ea71b2e05c59547033247 on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

alukach commented 10 years ago

On second thought, I think it's best to NOT support containment lookups on list type data. Rather than doing a lookup by a list, a user could chain lookups such as:

# To get object with data={'a':[1,2,3]}
Something.objects.filter(data__contains=1).filter(data__contains=2).filter(data__contains=3)

This allows for a user to still use key querying. I believe that the rule of thumb is that the lookup cannot be an iterable, but lookups by str, int, and float are fine.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.32%) when pulling 7459cf12a205245a3b5df4d6be3b005bdcd69c48 on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.3%) to 92.058% when pulling 7459cf12a205245a3b5df4d6be3b005bdcd69c48 on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

alukach commented 9 years ago

Alright, I'm going to stamp this Pull Request as "Finished" (:facepunch:) and ready-to-merge (although I suppose the commented-out tests should be removed). Let me know if you have any questions or comments.

nemesifier commented 9 years ago

I feel guilty not having been able to find the time to properly look into this but I will.

alukach commented 9 years ago

@nemesisdesign No worries, we all get busy sometimes. Let me know if you have any questions.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.33%) when pulling 906d94651a0549748fe6200306045a5c6c602dbb on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.33%) when pulling f76314b71a53dbe484ff5cb186d134cf604bf13f on alukach:serialized-field into 7ccfe86696f43b6b8d42b70126ba553302c9a1dc on djangonauts:master.

alukach commented 9 years ago

@nemesisdesign This has been in waiting for about 4 months now. Any thoughts to contribute? I'd obviously like to get this merged.

nemesifier commented 9 years ago

hi @alukach and sorry for this delay.

I would like to merge this feature because I think it's useful.

Let me ask you 2 questions:

I tested locally this branch and I found out something weird that looks like an issue.

Try adding these lines in tests.django_hstore_tests.tests at line 668:

databag.full_clean()
databag.save()
databag = SerializedDataBag.objects.get(name='number')
self.assertEqual(databag.data['num'], 1)

And you should get 1 failure.

The failure does not happen if you remove full_clean().

There are also 3 other small things that would need to be done, I'm sure you thought about it yourself but probably gave up because you got no feedback from us:

Federico

alukach commented 9 years ago

Alright, let me take a look at that bug you addressed and address the TODOs.

Regarding your questions:

are you using in production this feature anywhere?

No, I am not currently.

are you aware that once released there will be bug reports or questions regarding this feature?

Surely there will be bug reports or questions coming down the pipe. I'll do my best to address them.

nemesifier commented 9 years ago

@alukach ACK

alukach commented 9 years ago

@nemesisdesign Okay, I believe that this should take care of your thoughts.

The code has been rebased.

The admin widget was not properly implemented, causing the bug on databag.full_clean() that you mentioned. That has been resolved and added to tests.

Take a look at the docs, let me know what you think. I didn't actually build them, just edited the file.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 2117f4477f293de6145c81be7cc41dc0c3dafa64 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 2117f4477f293de6145c81be7cc41dc0c3dafa64 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 2117f4477f293de6145c81be7cc41dc0c3dafa64 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 2028f6c3dbc6c09bd839d206cc415bf9de5262e8 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 2028f6c3dbc6c09bd839d206cc415bf9de5262e8 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 2028f6c3dbc6c09bd839d206cc415bf9de5262e8 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 1ea7aa4fdb48e99e27c7959e6e47bc7784545b90 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 1ea7aa4fdb48e99e27c7959e6e47bc7784545b90 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 1ea7aa4fdb48e99e27c7959e6e47bc7784545b90 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-0.5%) to 91.84% when pulling 1ea7aa4fdb48e99e27c7959e6e47bc7784545b90 on alukach:serialized-field into cb6830a9475ca124c19552d7625456b2ef76373a on djangonauts:master.

nemesifier commented 9 years ago

thx @alukach will review and merge asap