bluesky-social / indigo

Go source code for Bluesky's atproto services.
https://atproto.com
Apache License 2.0
718 stars 101 forks source link

Limited search capabilities in CJK languages #777

Open quiple opened 2 weeks ago

quiple commented 2 weeks ago

The current search feature only allows you to search for words separated by spaces or symbols, but Japanese and Chinese don't use spaces, and Korean don't use spaces before postpositions, making it very difficult to get the results I want.

bnewbold commented 1 week ago

Hi! we do some special processing and indexing for Japanese text specifically (we had a large early Japanese user community). have you tested in that language specifically? we could potentially do similar indexing for other languages in the future.

quiple commented 1 week ago

Hi! we do some special processing and indexing for Japanese text specifically (we had a large early Japanese user community). have you tested in that language specifically? we could potentially do similar indexing for other languages in the future.

I just checked and it does seem to separate words when searching in Japanese as you said, maybe it has a built-in dictionary?

And Chinese search is weird, probably because of the mix of Hanzi and Kanji, and Korean search is useless.