Closed CuriousLearner closed 4 years ago
Alright, further research show that watson_searchentry
table has not filled in the search_tsv
for Chinese characters. Although it did it for English.
Note that I already used build watson with zh-cn
to ensure Chinese characters are parsed.
The big problem here is that the watson postgres backend creates a database table and an index using a single language catalogue. Adding multiple search backends with different language settings means that they'll both conflict with each other, and fight over which is the "true" language for the index.
To make this work, each search backend would need it's own database column added to the watson table, containing the tsvector parsed according to the desired language. This is a major refactoring effort.
@etianen Yeah, I guessed that would need a re-factor as well, once I started working on this.
I can try to see what I can do here. But do you have any idea, why the search_tsv doesn't populate Chinese characters, if I run buildwatson for zh-cn
?
buildwatson will use the search settings for the search backend that was the default when ./manage.py migrate was run.
So if you ran ./manage.py migrate when the default search backend was configured for english, then all content will be indexed in english.
On 8 June 2018 at 12:03, Sanyam Khurana notifications@github.com wrote:
@etianen https://github.com/etianen Yeah, I guessed that would need a re-factor as well, once I started working on this.
I can try to see what I can do here. But do you have any idea, why the search_tsv doesn't populate Chinese characters, if I run buildwatson for zh-cn?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/pull/249#issuecomment-395727646, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJFCAd85SHqlanrz2HRYwlA3tgAiDzpks5t6ln9gaJpZM4Ufsvi .
You need to iterate over all languages you have and create an index for them (i.e. with one column for each lang or even a whole table).
Hey @etianen
Wouldn't it be okay to keep the issue open so that if anyone wants to do refactor can do it, or otherwise, those who are searching for similar issue might find the issue in their search results?
It's still going to turn up in search results. But if nobody is working on it or paying attention to it, "closed" sounds about right to me.
On Wed, 5 Feb 2020 at 10:45, Sanyam Khurana notifications@github.com wrote:
Hey @etianen https://github.com/etianen
Wouldn't it be okay to keep the issue open so that if anyone wants to do refactor can do it, or otherwise, those who are searching for similar issue might find the issue in their search results?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/pull/249?email_source=notifications&email_token=AABEKCFWHC7IRWV3OXFBK6LRBKKC3A5CNFSM4FD6ZPRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK263YQ#issuecomment-582348258, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCEXU34RUQCPVS5J3B3RBKKC3ANCNFSM4FD6ZPRA .
An attempt to refactor the library to make multilingual searches works for #248
I'm not sure, but seems like the build watson command isn't indexing the SearchEntry properly.
I'm using a different configuration for chinese. I see that buildwatson command activates the particular language before doing anything, but how does it know which parser to use before indexing the data.
@etianen Can you please help?