dharple / detox

Tames problematic filenames
BSD 3-Clause "New" or "Revised" License
318 stars 19 forks source link

utf_8 filter converts spaces to underscores #100

Closed dave-kennedy closed 1 year ago

dave-kennedy commented 1 year ago

I would expect the utf_8 filter to translate to utf-8 and do nothing else but it replaces space with underscore:

$ echo 'foo bar' | inline-detox -s utf_8-only
foo_bar
dharple commented 1 year ago

Thanks for the feedback!

I assume you're running detox version 1. You can override the default behavior by creating a custom translation table. Depending on how you installed detox, there should be a file in /usr/share/detox or /usr/local/share/detox called unicode.tbl. You can create a copy of this file, maybe unicode-updated.tbl, and add a line like this:

0x20            " " # keep spaces

You'll need to modify your detoxrc, probably in /etc/detoxrc or /usr/local/etc/detoxrc, and update the utf_8-only sequence to use this file:

sequence "utf_8-only" {
  utf_8 {
    file "unicode-updated.tbl";
  };
};

Alternatively, detox version 2 allows you to not specify a default value to use when a specific entry is missing from a translation table. You can simply leave 0x20 out of unicode-updated.tbl and make sure there is no default line, and detox will leave the character alone.