bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.07k stars 80 forks source link

Filter by detected unicode script types or IANA character sets detected in name and files.path values #155

Closed bmfrosty closed 4 months ago

bmfrosty commented 4 months ago

Is your feature request related to a problem? Please describe

I'd like to be able to filter for types of script or IANA character set detected in file and path names.

Describe the solution you'd like

Unicode is broken up into multiple script types and character sets (I believe there are characters than can be a part of multiple script types or character sets, but I am not a unicode expert). I would like to be able to filter by things that only include things in something like Latin script or the US-ASCII character set to try and limit one way, or alternatively search for things that include Hiragana OR Katakana, or even more complex like Hiragana OR Katakana. Even more complex would be Han AND NOT Hiragana AND NOT Katakana.

Additional context

Certain languages will include characters in multiple scripts or character sets. For content in English, it may work well to filter only by something (this will take trial and error), but Chinese content will contain Han, but not Hiragana or Katakana, and Japanese content will usually contain Hiragana or Katakana and also other characters.

mgdigital commented 4 months ago

Hi @bmfrosty , there's already this open issue that might do what you want in a slightly different way - https://github.com/bitmagnet-io/bitmagnet/issues/49 - if the language filter was populated using the results from https://github.com/pemistahl/lingua-go, would this achieve what you'e aiming for?

bmfrosty commented 4 months ago

Sounds like it. I didn't see it when I was looking. Feel free to close this request.

On Sat, Feb 24, 2024, 12:28 AM mgdigital @.***> wrote:

Hi @bmfrosty https://github.com/bmfrosty , there's already this open issue that might do what you want in a slightly different way - #49 https://github.com/bitmagnet-io/bitmagnet/issues/49 - if the language filter was populated using the results from https://github.com/pemistahl/lingua-go, would this achieve what you'e aiming for?

— Reply to this email directly, view it on GitHub https://github.com/bitmagnet-io/bitmagnet/issues/155#issuecomment-1962297532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAHYDSEDJBHPYRZLJTTQXDYVGQDPAVCNFSM6AAAAABDSJ57A2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGI4TONJTGI . You are receiving this because you were mentioned.Message ID: @.***>