collective / collective.elasticsearch

Plone ElasticSearch Integration
https://pypi.python.org/pypi/collective.elasticsearch/
GNU General Public License v2.0
18 stars 11 forks source link

How to make the search more likely to Plone default? #62

Open idgserpro opened 5 years ago

idgserpro commented 5 years ago

I've created a really simple instance:

[buildout]
parts = instance
extends = https://dist.plone.org/release/4-latest/versions.cfg
index = https://pypi.org/simple/
versions = versions

[instance]
http-address = 27926
recipe = plone.recipe.zope2instance
user = admin:admin
eggs =
    Plone
    Pillow
    collective.elasticsearch

And 3 published pages: im another, another and im test.

This is what's indexed in my elastic local installation: http://localhost:9200/plone-portal_catalog/portal_catalog/_search?stored_fields=path.path&size=50:

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":9,"max_score":1.0,"hits":[{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"32be5a425f6d4af881aba55961c7ba96","_score":1.0,"fields":{"path.path":["/Plone/news"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"89283a68cd9d47c59138952d8ae049fe","_score":1.0,"fields":{"path.path":["/Plone/events"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"24813bf7cf644f6d8a178d18ab3bc18f","_score":1.0,"fields":{"path.path":["/Plone/front-page"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"b434dbb51a67447f85e4a97484423afe","_score":1.0,"fields":{"path.path":["/Plone/another"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"2e3f54ac1f7d478abf1530f19740f6c4","_score":1.0,"fields":{"path.path":["/Plone/events/aggregator"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"6f39d3a705534d5981ec91410ac2a950","_score":1.0,"fields":{"path.path":["/Plone/Members"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"17520f9c96564f4e8d8e5052962947c4","_score":1.0,"fields":{"path.path":["/Plone/im-another"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"2156968aef0448ef97c570a17cd5364e","_score":1.0,"fields":{"path.path":["/Plone/news/aggregator"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"0e85a330f14b4b9f895b8fdcac5bd447","_score":1.0,"fields":{"path.path":["/Plone/im-test"]}}]}}

When searching for im another with Plone default search I get:

Seleção_183

After enabling elasticsearch on controlpanel, converting and rebuilding the catalog:

Seleção_184

This looks like that Plone does a im AND another query but in elasticsearch is a im OR another, thus giving all results above.

The question is: how to make it be the same logic as Plone? Although you can change the query yourself, I would really prefer to not mess up with it since a lot of logic is done in https://github.com/collective/collective.elasticsearch/blob/d8df2b90ff70a9abcb68e7c8564f1d4a78f0086a/collective/elasticsearch/query.py#L41 I think having a default configuration in the query making it behave with the AND operator in the plugin itself is a better approach.

djay commented 5 years ago

I'm not sure why the results are different but since they are both textindexes there isn't a lot of tweaks on the plone end you can do to adjust the result set. I'd argue the elastic result set is better anyway. Most text searches you would expect using OR but sorting by relevence.

On Thu., 15 Aug. 2019, 04:04 IDG SERPRO, notifications@github.com wrote:

I've created a really simple instance:

[buildout]

parts = instance

extends = https://dist.plone.org/release/4-latest/versions.cfg

index = https://pypi.org/simple/

versions = versions

[instance]

http-address = 27926

recipe = plone.recipe.zope2instance user = admin:admin

eggs =

Plone

Pillow

collective.elasticsearch

And 3 published pages: im another, another and im test.

This is what's indexed in my elastic local installation: http://localhost:9200/plone-portal_catalog/portal_catalog/_search?stored_fields=path.path&size=50 :

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":9,"max_score":1.0,"hits":[{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"32be5a425f6d4af881aba55961c7ba96","_score":1.0,"fields":{"path.path":["/Plone/news"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"89283a68cd9d47c59138952d8ae049fe","_score":1.0,"fields":{"path.path":["/Plone/events"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"24813bf7cf644f6d8a178d18ab3bc18f","_score":1.0,"fields":{"path.path":["/Plone/front-page"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"b434dbb51a67447f85e4a97484423afe","_score":1.0,"fields":{"path.path":["/Plone/another"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"2e3f54ac1f7d478abf1530f19740f6c4","_score":1.0,"fields":{"path.path":["/Plone/events/aggregator"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"6f39d3a705534d5981ec91410ac2a950","_score":1.0,"fields":{"path.path":["/Plone/Members"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"17520f9c96564f4e8d8e5052962947c4","_score":1.0,"fields":{"path.path":["/Plone/im-another"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"2156968aef0448ef97c570a17cd5364e","_score":1.0,"fields":{"path.path":["/Plone/news/aggregator"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"0e85a330f14b4b9f895b8fdcac5bd447","_score":1.0,"fields":{"path.path":["/Plone/im-test"]}}]}}

When searching for im another with Plone default search I get:

[image: Seleção_183] https://user-images.githubusercontent.com/8203476/63055148-a9e48b80-bebb-11e9-9a42-2a24c373b2e6.png

After enabling elasticsearch on controlpanel, converting and rebuilding the catalog:

[image: Seleção_184] https://user-images.githubusercontent.com/8203476/63055153-ac46e580-bebb-11e9-8f65-26b38df169f3.png

This looks like that Plone does a im AND another query but in elasticsearch is a im OR another, thus giving all results above.

The question is: how to make it be the same logic as Plone? Although you can change the query https://collectiveelasticsearch.readthedocs.io/en/latest/config.html#changing-the-query-made-to-elasticsearch yourself, I would really prefer to not mess up with it since a lot of logic is done in https://github.com/collective/collective.elasticsearch/blob/d8df2b90ff70a9abcb68e7c8564f1d4a78f0086a/collective/elasticsearch/query.py#L41 I think having a default configuration in the query making it behave with the AND operator in the plugin itself is a better approach.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/collective/collective.elasticsearch/issues/62?email_source=notifications&email_token=AAAKFZHGNBQE3NCMTXWTD43QERXM7A5CNFSM4ILY5572YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HFJYWKA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAKFZASPQTJW3466G4KUGLQERXM7ANCNFSM4ILY557Q .

idgserpro commented 4 years ago

I think AND better. Returning a lot of documents only confuses the user more. And the pagination is very big. If it's so much better, why hasn't Plone changed to OR? If I search for "yellow house", I am not interested in "green house". I probably I will not arrive to the document that has a "green house". I would like to know the opinion of more people on this subject.

@mauritsvanrees @jensens @ale-rt @vangheem @zopyx ?

mauritsvanrees commented 3 years ago

I prefer AND as well. When I use a search engine and query for "something good" I get annoyed when the engine returns results that only have "something". That happens regularly, although it could certainly be simply because the result page originally had "good" on it, but not anymore, and the search engine has not visited it since.

I guess if you are fluent in elastic search, you could tweak things: when AND gives less than 10 results, show those at the top, but add some results that only match for OR. But I am not using elastic search.

jensens commented 3 years ago

I do not use collective.elasticsearch - anyway, AND and OR would be nice to have.

idgserpro commented 3 years ago

@mauritsvanrees your suggestion is similar to @djay ' opinion. That is, use OR as a standard, but first list the results that contain all terms. Today, however, this ordering doesn't occur. Even this option I don't think is very good. When there is no document that contains all the terms, it will cause an expectation that the results will contain all the terms, when in fact it doesn't. I prefer to simply do as plone does, that is, to return no value. Then the user has the option to decrease the number of terms, if he wishes.

mauritsvanrees commented 3 years ago

Indeed I prefer AND, and this should be the default.

But if it makes sense in a use case, someone could implement my suggestion. Or maybe when there are no results with AND, you could show alternative search results with OR, as long as you add a message to that effect. But that may be something to do in a frontend: do the AND search, get no results back, try the OR search and show those results with a special message. Sounds like custom work for one site, not something to do by default in this package.

idgserpro commented 3 years ago

I do not use collective.elasticsearch - anyway, AND and OR would be nice to have.

@jensens , this is an interesting question too. I had to do a customization in collective.elasticsearch, because a client of ours wanted to use the binary operator OR (in fact, or lowercase), in the same way that he used or in a Plone 4.3 site without elastic (Plone 5.2 has a problem with the or operator, see: https://community.plone.org/t/adding-and-or-to-search-terms-results-in-no-hits/6878/9). I then had to do two things:

To do this I had to change the elastic query to use query string.

djay commented 1 year ago

Elastic has a simple query syntax which isn't currently used in this integration. It's much closer to what zctextindex supports with phrase matching and operators. I'd propose using this instead - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#simple-query-string-syntax

We did this for a previous site and it worked well