elgatito / script.elementum.burst

Development of this addon has been stopped!
MIT License
118 stars 118 forks source link

Fix release types detection to avoid false blocks #439

Closed antonsoroko closed 1 day ago

antonsoroko commented 1 day ago

Currently in included_rx() we search for keyword without end boundary, so keyword TS (from filter_telesync) will match movie called Tsunami and will block it. e.g. https://www.themoviedb.org/search/movie?query=tsunami Same for other "small" keywords: cam, tc, scr.

So I updated included_rx to search whole keyword to avoid false matches. Since we add spaces around title (value = ' ' + value.lower() + ' ') we can simply use \W+ - it will match end of word and end of line. (initially i wanted to use (\W+|$) or (?:\W+|$))

In the past it also was matched with end boundary - https://github.com/elgatito/script.elementum.burst/commit/67e7e0d86d7b4cf5510ed661d92c59d1669d0adf#diff-78cc99018f5874a1a9641c20d0e5b1d595c3ffa33528e7c7c49fad8e111a0268L68 e.g. it was _ts_ where _ was replaced by space and then searched in title.

I tested it with several torrents - i do not see any regression, but i see previously incorrectly blocked results.

Online regexp tests: https://regex101.com/r/nOI6P7/1 before: Screenshot 2024-09-25 at 14-24-43 regex101 build test and debug regex after: Screenshot 2024-09-25 at 14-25-00 regex101 build test and debug regex

so now it does not match words like "camera" and "tsumani".


there is still small issue with filters line trailer and line, since those are quite common words, but i guess we can't fix this easily.