etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.2k stars 130 forks source link

"Improving postgres query escaping" broke search for things like IP addresses #146

Closed JirkaV closed 8 years ago

JirkaV commented 8 years ago

The commit 30b59a00a98fd114244a541d27668b95c4c81226 (which sadly was probably an indirect result of issue #112 I reported) broke search for things like IP addresses. I've recently upgraded my system to watson 1.1.9 and my users reported that the search suddenly started returning no results for things like "1.2.3.4" or "fe80::1".

The good news is that these items are properly listed in the search_tsv column in the database. However they can't be searched for as "escape_query()" in backends.py converts "1.2.3.4" to "1234", resulting in no matches.

Would it be possible to revert to the original behaviour? This escaping seems overly agressive and (I hope) undesired.

Thank you!

etianen commented 8 years ago

Yes, it's undesired. The issue is that there's no postgres-supported way of escaping full text search queries, and no real documentation on what a significant piece of punctuation is, so there's a bit of guesswork involved.

I've re-enabled periods in query terms, since neither postgres or MySQL treat it as a special character.

JirkaV commented 8 years ago

Sorry for late reply. First of all, thanks for fixing the "dot" problem for me!

If you're interested, I had some time today and looked a PostgreSQL sources, did some testing afterwards. There is really only few special chars (in normal ASCII) that cause trouble. These are:

! & : ( ) |

Would you be open to blacklisting those instead of whitelisting just a few "known safe"? I'd be happy to write test cases to catch edge cases, but would like your input first.

Thanks!

Jirka

etianen commented 8 years ago

That's useful info. I'd be happy to take a pull request that changed the escaping behaviour, particularly so if it had some good tests! :D

On Fri, 18 Mar 2016 at 16:26 Jirka Vejrazka notifications@github.com wrote:

Sorry for late reply. First of all, thanks for fixing the "dot" problem for me!

If you're interested, I had some time today and looked a PostgreSQL sources, did some testing afterwards. There is really only few special chars (in normal ASCII) that cause trouble. These are:

! & : ( ) |

Would you be open to blacklisting those instead of whitelisting just a few "known safe"? I'd be happy to write test cases to catch edge cases, but would like your input first.

Thanks!

Jirka

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/issues/146#issuecomment-198436054