NyaaPantsu / nyaa

Nyaa.se replacement written in golang
MIT License
994 stars 148 forks source link

Disable searches against less than 3 characters #561

Closed kkartaltepe closed 7 years ago

kkartaltepe commented 7 years ago

You should probably disable searches for 1-2 characters as it cannot utilize the trigram index meaning its a vector for abuse and arguably useless.

Only consideration I can see is CJK users doing searchs for 1-2 characters but hopefully its not too restrictive for them.

yiiTT commented 7 years ago

3 Characters or more sounds good with me

sfan5 commented 7 years ago

What if you want to search for episode numbers? Also how is the limit determined? utf-8 bytes? unicode codepoints?

kkartaltepe commented 7 years ago

unicode codepoints is my understanding.

As for the episode numbers there is no need to limit every search term to 3 or more characters, just to prevent searches of only less than 3 characters.

You can play around with the explain plans yourself to see how things break down just to double check.

-- edit although if you want to support 'or' queries you may want to restrict each 'or' clause to containing at least one term of 3 characters.

sfan5 commented 7 years ago

Due to how search works searching for "some-anime 04" will also trigger the performance problem with trigrams (SELECT * FROM torrents WHERE torrent_name LIKE '%some-anime%' AND torrent_name LIKE '%04%')

kkartaltepe commented 7 years ago

You know I learn something every day. Unfortunately it appears postgres makes some wrong assumptions with respect to GIN indices and will try to merge all predicates into one type of scan (index or table) rather than considering separating the predicates into different nodes. (Should be noted that the sequential scan is in fact much faster than the index only scan on the combined predicate so at least is partially right!)

You can force an index scan followed by a filter which is what I expected it to chose with something like

with q as (select * from torrents where torrent_name like '%some-anime%')
select * from q where torrent_name like '%04%' and torrent_name ...

or apparently

select * from torrents 
where torrent_name like '%some-anime%' and (torrent_name||'') like '%04%' and (torrent_name||') ...

Pretty ugly, but it appears to be the only way to make up for the query planner. Maybe someone can be woo'ed to improve the query planner for pg11.

akuma06 commented 7 years ago

Is it still a thing? Since we use ES now

kkartaltepe commented 7 years ago

If you are no longer using postgres indices for accelerating these searches then this is probably better off closed.