Auto-detect encoding? - Githubissues

Documentation says

--encoding: Encoding to use for reading text files [default: utf-8]

But different files have different encodings. Chinese PDF is being read correctly and characters are showing up correctly, but a .txt file in the same folder that's encoded in GB2312 is being garbled in both the search results and the file display.

Probably it should default to detecting the encoding for each file independently and then converting them internally to whatever the embedding expects (UTF8?)

https://pypi.org/project/chardet/

freedmand / semantra

Auto-detect encoding? #50