[Feature Request] Support for other character encodings - Githubissues

DrKain / subclean

A cross-platform CLI tool and node module to remove advertising from subtitles. Supports Bazarr and bulk cleaning!

MIT License

56 stars 5 forks source link

[Feature Request] Support for other character encodings #8

Open DrKain opened 3 years ago

DrKain commented 3 years ago

Right now the tool will fail when trying to parse files with this character encoding. For a viable solution the tool should be able to detect the character encoding and convert to UTF-8 when required.
The converted data should be written even if nodes were not modified, this will remove the need to convert a file multiple times when running subclean on an entire library as a scheduled task.

See this https://github.com/DrKain/subclean/issues/7#issuecomment-948572760 for information on a temporary solution for the current problem.

Unfortunately this will require a dependency like utf8.

Test files:

UCS-2 BE BOM: subtitle.zip
UTF-8-BOM: subtitle.zip

DrKain commented 2 years ago

If you're using Bazarr, you can avoid this issue with the setting:

Settings → Subtitles → Post-Processing → Encode Subtitles To UTF8