Sotera / webpageclassifier

Categorizes a website given URL into one of blog|wiki|news|forum|classified|shopping|undecided.
Apache License 2.0
8 stars 3 forks source link

Goldwords files fail on Cyrillic text. #17

Closed ctwardy closed 7 years ago

ctwardy commented 7 years ago

Use utf8 instead of cp1252 in store_html().