dharple / detox

Tames problematic filenames
BSD 3-Clause "New" or "Revised" License
330 stars 19 forks source link

max_length filter chops UTF-8 chars #55

Open dharple opened 3 years ago

dharple commented 3 years ago

The max_length filter does not respect multibyte characters.

dharple commented 3 years ago

Reproduce with:

# ---------------------------------------------------------------------------

INPUT=$(printf "\u0201\u0202\u0203\u0204\u0205\u0206\u0207\u0208")
OUPTUT=$(printf "\u0201\u0202\u0203\u0204")
METHOD1=max-length-9

test_sequence "$DETOX" "$INPUT" "$OUTPUT" "$TABLEPATH" "$METHOD1"