As evidenced by #116 , the "--language" flag isn't well known.
We should document it's usage so that people using Docsplit with foreign character sets do not have their document's UTF characters replaced by '?' by the TextCleaner.
We should also document the fact that "--language" only sets :clean to false if called via the command line. If using Docsplit as a library it must be specified on the options passed to Docsplit.extract_text
As evidenced by #116 , the "--language" flag isn't well known.
We should document it's usage so that people using Docsplit with foreign character sets do not have their document's UTF characters replaced by '?' by the TextCleaner.