Closed Dehax closed 1 year ago
Hi there, @Dehax
Thanks for the feedback, and the time you took to craft this pull request.
I am glad to inform you that we have this already solved in https://github.com/OnTheGoSystems/wpml-elasticpress/pull/13 (and an extra, small refinement is on its way).
We will be having a new release for this plugin relatively soon, including this fix among other changes from recent pull requests, and some other surprises.
If you do not mind, I am closing this as the defect fixed here was already fixed in the link above.
Thanks again!
Hi @decodekult ,
I'm using the newest version from master
branch (commit b559d0b) which already includes changes from #13 and this issue is not resolved in it yet. This version still always returns 0 search results when filtering by "zh-hans" language although Elasticsearch index contains many documents with "post_lang": "zh-hans"
. I hope the small refinement you are mentioning will fix this "dash issue". Will be waiting for the fix. Thanks!
Hi @Dehax
I have checked this deeply, and you are right! The change mentioned above just included a new analyzer for the language field, but it still uses a standard tokenizer, which splits content by dashes. In addition, another related commit about including support for posts set to display as translated when a translation is missing also introduced some unwanted outcome.
I will be submitting a pull request addressing both things later today or early tomorrow.
In any case, we are playing with alternatives for https://github.com/OnTheGoSystems/wpml-elasticpress/issues/6 that will probably result in not having, or not needing, a language field.
I will update here once I am done. Thanks a lot!
This is gettign fixed at https://github.com/OnTheGoSystems/wpml-elasticpress/pull/22
As described in the related issue on https://github.com/OnTheGoSystems/wpml-elasticpress/issues/21 we can not turn this field into a keyword. But we can improve its analyzer.
Keeping this one open until we merge and confirm the other solution.
FYI @Dehax
Hi @decodekult ,
Thanks for the update!
It seems that char_group
tokenizer was added in Elasticsearch 6.4.0 and thus it's not available in 5.2.2 (the version providen by ElasticPress).
I've confirmed this at "ElasticPress Sync" page. After pressing "Delete all Data and Start a Fresh Sync" button the sync process fails with the error:
Mapping failed: Unknown tokenizer type [char_group] for [post_lang_tokenizer]
Oh well, I just merged the change. I think it opens the issue again.
Sorry for the back-and-forth.
This one should fix the problem for good: https://github.com/OnTheGoSystems/wpml-elasticpress/pull/25
Thanks for your patience.
Yes, #25 works with Elasticsearch 5.2.2 and fixes #21. Thanks!
Great indeed!
Thanks a lot.
Added
keyword
forpost_lang
in order to fix filtering by "zh-hans".Language codes with dashes give multiple tokens when analyzed:
And this filter always gives 0 results:
So in order to fix the post filtering issue for such language codes we need to either use additional "keyword" field with "keyword" data type or use "keyword" analyzer for
post_lang
field or just mappost_lang
field to "keyword" data type.