bluesky-social / indigo

Go source code for Bluesky's atproto services.
https://atproto.com/docs
Apache License 2.0
600 stars 88 forks source link

palomar: When the search conditions are set to Japanese and English with |, only Japanese search results are displayed #677

Closed usounds closed 3 weeks ago

usounds commented 1 month ago

Describe the bug When the search conditions are set to Japanese and English such as (テスト|Test) , only Japanese search results are displayed. This behavior was the same when calling app.bsky.feed.searchPosts and when using the search on Bsky.app. I think this might be an issue with Palomar, but if I'm mistaken, please let me know.

Post(1) : This post is 'テスト' in Japanese. https://bsky.app/profile/sports.usounds.work/post/3ktzbozddv22k

Post(2) : This post is 'Test' in English. https://bsky.app/profile/sports.usounds.work/post/3ktzbp6nuw22s

To Reproduce I called app.bsky.feed.searchPosts like this: (a) 'q=from:sports.usounds.work テスト' -> Post (1) is a hit https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?q=from%3Asports.usounds.work+%E3%83%86%E3%82%B9%E3%83%88

(b) 'q=from:sports.usounds.work Test' -> Post (2) is a hit https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?q=from%3Asports.usounds.work+Test

(c) 'q=from:sports.usounds.work (テスト|Test)' -> Only post (1) is a hit. https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?q=from%3Asports.usounds.work+%28%E3%83%86%E3%82%B9%E3%83%88%7CTest%29

I'm using Bsky.app: (a) from:sports.usounds.work テスト

image

(b) from:sports.usounds.work Test

image

(c) from:sports.usounds.work (Test|テスト)

image

Expected behavior In the case of (c), both (1) and (2) posts are hits.

Details

Additional context

bnewbold commented 3 weeks ago

This is a difficult query for us to make work in two ways.

Boolean search isn't a documented or really supported/tested feature of the query syntax: if there was an easy way to disable it we might do so. Doing full boolean search support would be a great future, but would take a lot of work to fully test and support in production.

To support Japanese search better specifically, we created two indices: one for Japanese text, and one for all other languages. While it probably is possible to parse the query out in to individual tokens and query the two indices separately, and then stitch the results back together, this is likely an infrequent-enough use case that we won't add that support in the near future. Sorry!

usounds commented 3 weeks ago

Thank you for your reply! I understand the current situation.