google-research / url-nlp

195 stars 23 forks source link

Fix for #7: Encoding error when running with Python 3 #8

Closed mihnita closed 11 months ago

mihnita commented 1 year ago

Making (almost) all strings that contain language text unicode.

To be very pedantic we should probably also do .replace(u'"', u"'").replace(u"/", u" ") and all .split(u"/"), but they don't contain non-ASCII text, so it's OK.

icaswell commented 1 year ago

This change should no longer be needed after https://github.com/google-research/url-nlp/commit/6ce0aec9c20aa2369ffbe6aede2fd1006143f693

sebastianruder commented 11 months ago

Please open another PR if this (or another change) is still necessary.