documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.io/docsplit/
Other
832 stars 214 forks source link

Add section to documentation regarding the "--language" flag #117

Closed nathanstitt closed 9 years ago

nathanstitt commented 10 years ago

As evidenced by #116 , the "--language" flag isn't well known.

We should document it's usage so that people using Docsplit with foreign character sets do not have their document's UTF characters replaced by '?' by the TextCleaner.

nathanstitt commented 10 years ago

We should also document the fact that "--language" only sets :clean to false if called via the command line. If using Docsplit as a library it must be specified on the options passed to Docsplit.extract_text