Closed reallyallnamestaken closed 1 year ago
I found the issue to be due to the web pages being UTF-8 encoded, but no header is set to let the browser know this. As a workaround I modified the code to the needed header to the cache file.
f = open(f'{cache_dir}/{chapter_no}.html', 'w', encoding='utf-8')
#below line does the magic
f.write('<meta charset="utf-8" />\n')
f.write(content)
f.close()
Hello, thank you for reporting this to me. I modified the script with your proposed fix.
I would come across some books with what looks to be character encoding issues. These would be seemingly random pages (though always the same pages if I redo the download) in only certain books.
characters such as • or ÂÂÂ, etc will appear across these pages.