1. Windows' default encoding when opening files with Python's open() method is apparently not UTF-8 - this causes reading data from quite a few files to fail. This is an easy fix though, we just need to specify UTF-8 as the encoding everywhere we have an open() call with no encoding specified.
Fugashi depends on a DLL, and the behavior for handling these changed in 3.8. This issue is documented in polm/fugashi#33 and can currently be worked around by installing with Anaconda and presumably other virtual environments, but it does break installs not using virtual environments. The short term solution is probably just to wrap the fugashi import in a try/catch block and disable it if we can't import it, but long term it would be best if we could just submit a PR to fugashi that fixed the issue. This will require doing some homework about the new Python DLL handling, though.
1. Windows' default encoding when opening files with Python'sopen()
method is apparently not UTF-8 - this causes reading data from quite a few files to fail. This is an easy fix though, we just need to specify UTF-8 as the encoding everywhere we have anopen()
call with no encoding specified.edit: First issue is solved.