enzo1982 / freac

The fre:ac audio converter project
https://www.freac.org/
GNU General Public License v2.0
1.43k stars 75 forks source link

Illegal Characters Replacement #455

Open Akczht opened 1 year ago

Akczht commented 1 year ago

I have got a few albums where '/' are used in their filenames and a few other albums with some other illegal unicode characters, but right now fre:ac either removes them e.g. for colon ':', or replaces them with a '-' for '/'.

This problem was thought of before also, if someone were to name a file to that of a special character, full width alternatives were added.

under the output tab of fre:ac an option could be added where, after enabling it, it automatically replaces all the illegal characters with their full-width replacements while outputting media.

here is a list of all the fullwidth characters

enzo1982 commented 1 year ago

Thank you for suggesting this!

I consider making character replacements user modifiable, so you could change the current replacements or add your own for characters not currently in the list.

Then there could also be the option to select pre-designed sets of replacements, including one that uses full-width characters.

Not sure when I can get to it, but I like the idea.

Akczht commented 1 year ago

Yes, a character replacement menu is a better implementation.

Barough commented 1 year ago

A Illegal Characters Replacement option like the one EAC have would be great to see in fre:ac.

Akczht commented 1 year ago

Thank you for suggesting this!

I consider making character replacements user modifiable, so you could change the current replacements or add your own for characters not currently in the list.

Then there could also be the option to select pre-designed sets of replacements, including one that uses full-width characters.

Not sure when I can get to it, but I like the idea.

After thinking for a while, I found rather than having a whole new section for illegal character replacement, setting up rules for character replacement under the already available option for character replacement, using these replacements will not only be kind of accurate, but also be the best of both worlds. A suggestion that you might consider.

Lee-Carre commented 1 year ago

Picard has something similar.

I, personally, would much prefer a user-customisable replacement table/matrix, rather than it being hard-coded.

For example, replacing a colon with a semi-colon, is fine, and (visually) almost indistinguishable while having a semantically similar meaning.

Be aware of the catch with non-ASCII characters in Unicode, since some systems count/limit the bytes in the character string, rather than the characters. In such cases, using the likes of characters from the Full-Width block, would incur a rather higher byte-count (generally, depending on which UTF you're using, but UTF-8 is typical) per-character. This would then limit the length of the allowed string to remain valid (and you'd need various checks for that, too).

Besides good usability (configurable, with sensible defaults), enabling users to define their own replacements would allow them to handle their own use-cases (with whatever weird constraints that involves). Hard-coded replacements would mean having to account for all sorts of edge-cases, endlessly.

Different people have different preferences, too. They may simply want consistency with an existing set of files, regardless of what might otherwise be optimal/ideal.

I'll also mention Unicode Normalization (and the general concept that there are often multiple ways generating what appears to be the same character/glyph) as something else to be aware/mindful of. Be careful; the Unicode rabbit-hole goes deep 😉.


Beyond avoiding disallowed characters (those which have special meaning, as metacharacters, to other software), consider also the case of correcting poor typography. A common example being to replace a double-hyphen (--) with a real dash (either an en-dash (–), or an em-dash (—)). Though, I accept that this may be beyond the scope of Fre:AC, and/or better done with a different tool.

Akczht commented 1 year ago

This approach can also work, my solution was for if aforementioned were not to be implemented.

Akczht commented 8 months ago

At least if not a full fledged menu for custom replacements, these characters cab be replaced with their fullwidth counterparts if the option "Allow Unicode Characters" is enabled or another option can be given like "Forbidden Characters Replacements"

< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)

These are the replacements

:
;
! 
? 
*
<
>
/
\
|