Open idgserpro opened 5 years ago
I'm not sure why the results are different but since they are both textindexes there isn't a lot of tweaks on the plone end you can do to adjust the result set. I'd argue the elastic result set is better anyway. Most text searches you would expect using OR but sorting by relevence.
On Thu., 15 Aug. 2019, 04:04 IDG SERPRO, notifications@github.com wrote:
I've created a really simple instance:
[buildout]
parts = instance
extends = https://dist.plone.org/release/4-latest/versions.cfg
index = https://pypi.org/simple/
versions = versions
[instance]
http-address = 27926
recipe = plone.recipe.zope2instance user = admin:admin
eggs =
Plone Pillow collective.elasticsearch
And 3 published pages: im another, another and im test.
This is what's indexed in my elastic local installation: http://localhost:9200/plone-portal_catalog/portal_catalog/_search?stored_fields=path.path&size=50 :
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":9,"max_score":1.0,"hits":[{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"32be5a425f6d4af881aba55961c7ba96","_score":1.0,"fields":{"path.path":["/Plone/news"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"89283a68cd9d47c59138952d8ae049fe","_score":1.0,"fields":{"path.path":["/Plone/events"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"24813bf7cf644f6d8a178d18ab3bc18f","_score":1.0,"fields":{"path.path":["/Plone/front-page"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"b434dbb51a67447f85e4a97484423afe","_score":1.0,"fields":{"path.path":["/Plone/another"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"2e3f54ac1f7d478abf1530f19740f6c4","_score":1.0,"fields":{"path.path":["/Plone/events/aggregator"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"6f39d3a705534d5981ec91410ac2a950","_score":1.0,"fields":{"path.path":["/Plone/Members"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"17520f9c96564f4e8d8e5052962947c4","_score":1.0,"fields":{"path.path":["/Plone/im-another"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"2156968aef0448ef97c570a17cd5364e","_score":1.0,"fields":{"path.path":["/Plone/news/aggregator"]}},{"_index":"plone-portal_catalog_1","_type":"portal_catalog","_id":"0e85a330f14b4b9f895b8fdcac5bd447","_score":1.0,"fields":{"path.path":["/Plone/im-test"]}}]}}
When searching for im another with Plone default search I get:
[image: Seleção_183] https://user-images.githubusercontent.com/8203476/63055148-a9e48b80-bebb-11e9-9a42-2a24c373b2e6.png
After enabling elasticsearch on controlpanel, converting and rebuilding the catalog:
[image: Seleção_184] https://user-images.githubusercontent.com/8203476/63055153-ac46e580-bebb-11e9-8f65-26b38df169f3.png
This looks like that Plone does a im AND another query but in elasticsearch is a im OR another, thus giving all results above.
The question is: how to make it be the same logic as Plone? Although you can change the query https://collectiveelasticsearch.readthedocs.io/en/latest/config.html#changing-the-query-made-to-elasticsearch yourself, I would really prefer to not mess up with it since a lot of logic is done in https://github.com/collective/collective.elasticsearch/blob/d8df2b90ff70a9abcb68e7c8564f1d4a78f0086a/collective/elasticsearch/query.py#L41 I think having a default configuration in the query making it behave with the AND operator in the plugin itself is a better approach.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/collective/collective.elasticsearch/issues/62?email_source=notifications&email_token=AAAKFZHGNBQE3NCMTXWTD43QERXM7A5CNFSM4ILY5572YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HFJYWKA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAKFZASPQTJW3466G4KUGLQERXM7ANCNFSM4ILY557Q .
I think AND better. Returning a lot of documents only confuses the user more. And the pagination is very big. If it's so much better, why hasn't Plone changed to OR? If I search for "yellow house", I am not interested in "green house". I probably I will not arrive to the document that has a "green house". I would like to know the opinion of more people on this subject.
@mauritsvanrees @jensens @ale-rt @vangheem @zopyx ?
I prefer AND as well. When I use a search engine and query for "something good" I get annoyed when the engine returns results that only have "something". That happens regularly, although it could certainly be simply because the result page originally had "good" on it, but not anymore, and the search engine has not visited it since.
I guess if you are fluent in elastic search, you could tweak things: when AND gives less than 10 results, show those at the top, but add some results that only match for OR. But I am not using elastic search.
I do not use collective.elasticsearch
- anyway, AND and OR would be nice to have.
@mauritsvanrees your suggestion is similar to @djay ' opinion. That is, use OR as a standard, but first list the results that contain all terms. Today, however, this ordering doesn't occur. Even this option I don't think is very good. When there is no document that contains all the terms, it will cause an expectation that the results will contain all the terms, when in fact it doesn't. I prefer to simply do as plone does, that is, to return no value. Then the user has the option to decrease the number of terms, if he wishes.
Indeed I prefer AND, and this should be the default.
But if it makes sense in a use case, someone could implement my suggestion. Or maybe when there are no results with AND, you could show alternative search results with OR, as long as you add a message to that effect. But that may be something to do in a frontend: do the AND search, get no results back, try the OR search and show those results with a special message. Sounds like custom work for one site, not something to do by default in this package.
I do not use collective.elasticsearch - anyway, AND and OR would be nice to have.
@jensens , this is an interesting question too. I had to do a customization in collective.elasticsearch
, because a client of ours wanted to use the binary operator OR (in fact, or lowercase), in the same way that he used or in a Plone 4.3 site without elastic (Plone 5.2 has a problem with the or operator, see: https://community.plone.org/t/adding-and-or-to-search-terms-results-in-no-hits/6878/9). I then had to do two things:
To do this I had to change the elastic query to use query string.
Elastic has a simple query syntax which isn't currently used in this integration. It's much closer to what zctextindex supports with phrase matching and operators. I'd propose using this instead - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#simple-query-string-syntax
We did this for a previous site and it worked well
I've created a really simple instance:
And 3 published pages:
im another
,another
andim test
.This is what's indexed in my elastic local installation: http://localhost:9200/plone-portal_catalog/portal_catalog/_search?stored_fields=path.path&size=50:
When searching for
im another
with Plone default search I get:After enabling
elasticsearch
on controlpanel, converting and rebuilding the catalog:This looks like that Plone does a
im AND another
query but in elasticsearch is aim OR another
, thus giving all results above.The question is: how to make it be the same logic as Plone? Although you can change the query yourself, I would really prefer to not mess up with it since a lot of logic is done in https://github.com/collective/collective.elasticsearch/blob/d8df2b90ff70a9abcb68e7c8564f1d4a78f0086a/collective/elasticsearch/query.py#L41 I think having a default configuration in the query making it behave with the
AND
operator in the plugin itself is a better approach.