Closed bmfrosty closed 4 months ago
Hi @bmfrosty , there's already this open issue that might do what you want in a slightly different way - https://github.com/bitmagnet-io/bitmagnet/issues/49 - if the language filter was populated using the results from https://github.com/pemistahl/lingua-go, would this achieve what you'e aiming for?
Sounds like it. I didn't see it when I was looking. Feel free to close this request.
On Sat, Feb 24, 2024, 12:28 AM mgdigital @.***> wrote:
Hi @bmfrosty https://github.com/bmfrosty , there's already this open issue that might do what you want in a slightly different way - #49 https://github.com/bitmagnet-io/bitmagnet/issues/49 - if the language filter was populated using the results from https://github.com/pemistahl/lingua-go, would this achieve what you'e aiming for?
— Reply to this email directly, view it on GitHub https://github.com/bitmagnet-io/bitmagnet/issues/155#issuecomment-1962297532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAHYDSEDJBHPYRZLJTTQXDYVGQDPAVCNFSM6AAAAABDSJ57A2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGI4TONJTGI . You are receiving this because you were mentioned.Message ID: @.***>
Is your feature request related to a problem? Please describe
I'd like to be able to filter for types of script or IANA character set detected in file and path names.
Describe the solution you'd like
Unicode is broken up into multiple script types and character sets (I believe there are characters than can be a part of multiple script types or character sets, but I am not a unicode expert). I would like to be able to filter by things that only include things in something like Latin script or the US-ASCII character set to try and limit one way, or alternatively search for things that include Hiragana OR Katakana, or even more complex like Hiragana OR Katakana. Even more complex would be Han AND NOT Hiragana AND NOT Katakana.
Additional context
Certain languages will include characters in multiple scripts or character sets. For content in English, it may work well to filter only by something (this will take trial and error), but Chinese content will contain Han, but not Hiragana or Katakana, and Japanese content will usually contain Hiragana or Katakana and also other characters.