These three commits should handle non-ASCII characters in TV-series (episodes), as reported in issue #13 There might be other cases that have not been reported/encountered. Also, without test cases, I am not sure this fully fixes the problem, since Unicode handling remains finicky.
As far as I can see, the htmlParser.unescape uses the ascii codec by default, so anything you pass in needs to be decoded.
The second commit is a quick fix, and seems to be a case that was missed in a previous UTF-8 fix. The third commit factors out the cleaning of 'special' characters in titles, taking care of the UTF-8 conversion for that situation in a single location.
These three commits should handle non-ASCII characters in TV-series (episodes), as reported in issue #13 There might be other cases that have not been reported/encountered. Also, without test cases, I am not sure this fully fixes the problem, since Unicode handling remains finicky.
As far as I can see, the htmlParser.unescape uses the ascii codec by default, so anything you pass in needs to be decoded.
The second commit is a quick fix, and seems to be a case that was missed in a previous UTF-8 fix. The third commit factors out the cleaning of 'special' characters in titles, taking care of the UTF-8 conversion for that situation in a single location.