abartov / bybeconv

Project Ben-Yehuda's content management system.
https://benyehuda.org/
Other
10 stars 5 forks source link

API fulltext search works differently from website search #188

Closed damisul closed 1 year ago

damisul commented 1 year ago

Causes this issue: https://github.com/orgs/projectbenyehuda/projects/1

When searching on the PBY site for וירג'יניה וולף, we get 7 results. This seems correct to me. https://benyehuda.org/search/results/?q=%D7%95%D7%99%D7%A8%D7%92%27%D7%99%D7%A0%D7%99%D7%94%20%D7%95%D7%95%D7%9C%D7%A3

However, querying SARA for the same string yields 100 results. Why is the API returning 100 results and not 7?

damisul commented 1 year ago

There is several differences in fulltext search implementation:

Query_string queries allows to use special syntax to define complex query options like operators, etc. But it is not recommended for use with user-provided query strings, as if provided string will have syntax errors, it will end up with error.

ElasticSearch docs proposes to use 'simple_query_string' queries for user-provided data. See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html. If user will provided wrong query, it will silently ignore non-parseable parts of query instead of causing errors.

So to summarize, I propose to use simple_query_string queries for fulltext search and make it to work on 'fulltext', 'author_string' and 'title' fields of work. Also we need to use 'AND' operator by default.

NOTE: with this change API search will not work exactly in same way as website search (e.g. it will not work among persons), but it will work in a 'more similar way'.