The current version uses ISO-8859-1 as the encoding to download Gutenberg ebooks. However this restricts the languages that this library can download (see here).
For example, the current version cannot download Chinese texts. The changes I have made fixes that by changing the requests.get encoding to utf-8. Below is a script that downloads Journey to the West, a Chinese ebook, and saves it using the appropriate encoding.
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
OUTFILE = "<insert outfile destination here>"
text = strip_headers(load_etext(23962)).strip()
with open(OUTFILE, 'w') as f:
f.write(text)
The current version uses ISO-8859-1 as the encoding to download Gutenberg ebooks. However this restricts the languages that this library can download (see here).
For example, the current version cannot download Chinese texts. The changes I have made fixes that by changing the
requests.get
encoding toutf-8
. Below is a script that downloads Journey to the West, a Chinese ebook, and saves it using the appropriate encoding.