etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.21k stars 129 forks source link

Cast value to text to make filter work with uuid primary key columns #154

Closed vesterbaek closed 8 years ago

vesterbaek commented 8 years ago

Check for uuid primary key and add typecast if required to support join between model and watson tables

vesterbaek commented 8 years ago

Yup, I actually did that in my first implementation. The title of this pull request is a hint about that :). I added explicit support for the uuid type for performance reasons. We could add a fallback to text cast after the uuid check

etianen commented 8 years ago

How does the uuid conversion improve performance over the string conversion? I'd have thought that both blow the use of indices out of the water.

On Wed, 2 Mar 2016 at 14:36 Jeppe Vesterbæk notifications@github.com wrote:

Yup, I actually did that in my first implementation. The title of this pull request is a hint about that :). I added explicit support for the uuid type for performance reasons. We could add a fallback to text cast after the uuid check

— Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/pull/154#issuecomment-191262093 .

vesterbaek commented 8 years ago

uuid is a native type with 128 bit length: http://www.postgresql.org/docs/9.1/static/datatype-uuid.html. Representing the uuid as text leads takes more storage (see eg. http://stackoverflow.com/questions/33836749/postgresql-using-uuid-vs-text-as-primary-key). In this case where one table has text and one has uuid, my knowledge of PostgreSQL is not good enough and I'm not sure if there is a performance gain by casting to uuid.

If we added a object_id_uuid column to watson_searchentry (like you have for int) we would get improvements in both required storage space and in performance.

etianen commented 8 years ago

I think it's the sort of thing that could only be shown using benchmarks. I suspect, however, that any performance difference would be negligible, since the database won't be able to use indices either way. What's certain, however, is that the code required for the "cast to string" approach is much smaller, which is a definite win!

On Wed, 2 Mar 2016 at 21:46 Jeppe Vesterbæk notifications@github.com wrote:

uuid is a native type with 128 bit length: http://www.postgresql.org/docs/9.1/static/datatype-uuid.html. Representing the uuid as text leads takes more storage (see eg. http://stackoverflow.com/questions/33836749/postgresql-using-uuid-vs-text-as-primary-key). In this case where one table has text and one has uuid, my knowledge of PostgreSQL is not good enough and I'm not sure if there is a performance gain by casting to uuid.

— Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/pull/154#issuecomment-191447141 .

vesterbaek commented 8 years ago

Closing this one on favor of https://github.com/etianen/django-watson/pull/169