bebo-dot-dev / m3u-epg-editor

a python m3u / epg optimizer
120 stars 27 forks source link

epg parsing error - 'charmap' codec can't encode character #79

Closed MagicOneFr closed 1 year ago

MagicOneFr commented 1 year ago

Hello, I'm trying to filter an epg and I get this error: epg creation failure: 'charmap' codec can't encode character '\u25c9' in position 104: character maps to original.zip Is it possible to fix that please? Thank you.

bebo-dot-dev commented 1 year ago

Hi there, it maybe fixable but there's not enough to go on with what you've supplied so far.

On the face of it there's nothing wrong with your supplied xml file and a quick script test through with it worked as expected here, no errors seen.

MagicOneFr commented 1 year ago

Hello! Ok what would you need? It's difficult for me to provide the config file with the url I use because it contains my login and password for the iptv provider.

bebo-dot-dev commented 1 year ago

No problem, if you're able to supply the config file with URLs/passwords removed and the original m3u file, that would enable me to run the script in exactly the same way as you against your source data.

MagicOneFr commented 1 year ago

m3u.zip Here it is. Thank you very much.

MagicOneFr commented 1 year ago

I've retried today and same error: 2023-07-05T18:46:43.233726 creating channel element for m3u entry from tvg-name value FR - RTS UN FHD 2023-07-05T18:46:43.260751 epg creation failure: 'charmap' codec can't encode character '\u25c9' in position 104: character maps to

bebo-dot-dev commented 1 year ago

Thanks will take a look asap

bebo-dot-dev commented 1 year ago

Hi again, I've run a test with your supplied json config, m3u and xml file and no errors were seen.

The only changes that I made to your json config for the test run was to repoint the m3uurl and epgurl values to the local files you supplied, I also switched on log_enabled to true to enable debug output to process.log.

I've attached the files back here so you can take a look at the outcome and the details that were recorded for the test run in the process.log file.

If these supplied files were expected to fail then it is strange for sure and my gut tells me that it's perhaps some sort of environmental / windows O/S type of problem that you're seeing.

issue79.zip

MagicOneFr commented 1 year ago

I'm using windows 10 pro 21H2, french localisation. For python I'm using python 3.8.6 (64bits). I can see that line in your log; 2023-07-07T16:54:42.467740 creating channel element for m3u entry from tvg-name value FR - RTS DEUX FHD ◉ There's a strange round character on this line and in my log, it crashes just before adding this line. Is it possible that this character isn't correctly handled by my OS/Python ? Is it possible to strip this character? Thank you very much

bebo-dot-dev commented 1 year ago

Thank you it's good to learn a little more about your OS and environment and yes I did notice the non-standard character ◉ in your source data, it exists in both your m3u and xml files.

Stripping this character (and perhaps others) out could be an option but before we follow that idea, can I ask why you have the force_epg option switched on and if you've tried with this option off?

The original idea for the existence of the force_epg option is outlined in https://github.com/bebo-dot-dev/m3u-epg-editor/issues/42

In short, the force_epg option was introduced some time ago as a feature to force the creation of channels into the newly written XML EPG file for all channels that exist in an m3u file. As far as I know this is quite a seldom used feature because there's a hope (a dream :)) that most source XML EPG files are reasonably complete and of decent quality when paired with a given m3u file.

I can see from your error report that the script is failing for you in an area of code only active when force_epg is switched on so it might be worth trying to switch this option off to see if you're able to generate an acceptable EPG file without error.

MagicOneFr commented 1 year ago

Same result with force_epg to false. Thanks for the explanation about the option. I think that sometimes I have this kind of epg stored in the channel name. In the same time I have tried python 3.11 : Same result.

MagicOneFr commented 1 year ago

I have found a workaroud. As described in this page, I have added the 2 "set" command in a batch file before calling the python script and now, there's no more error! According to the same page, there's a way to fix the issue in the code.

MagicOneFr commented 1 year ago

https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters Sorry I forgot the link

bebo-dot-dev commented 1 year ago

Wow good find :)

The link you included explains that it is a Windows OS issue related to the Windows shell that you're using - it would appear that it doesn't by default support UTF-8.

Are you using the regular (old) Windows command prompt rather than a Powershell shell?

MagicOneFr commented 1 year ago

yes indeed.

bebo-dot-dev commented 1 year ago

OK this makes a little more sense to me now.

The issue you have encountered is a Unicode (UTF-8) related issue that was triggered by Unicode characters in your data combined with your Windows command prompt shell that for one reason or another doesn't support Unicode by default.

Reading around this subject I believe that once upon a time, the Windows command prompt didn't support Unicode at all. Microsoft applied a number of incremental changes throughout Windows 10 builds to get the windows command prompt to a point where it is now supposed to support Unicode. There is clearly something in your system setup where it doesn't by default support Unicode, personally I suspect that it's related to your French locale and that affecting the code-page that is in use within your command prompt - this is a guess on my part.

You might see different results in a Powershell prompt and you could get different results again with the new Windows Terminal (the one that is installable from the Windows Store).

I do appreciate you reporting this issue, it could be helpful if it crops up for someone else.

I think we can call this resolved if you're happy for this issue to be closed.

bebo-dot-dev commented 1 year ago

Feel free to reopen if need be.

For the record: https://stackoverflow.com/a/63573649 https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING https://docs.python.org/3/using/cmdline.html#envvar-PYTHONLEGACYWINDOWSSTDIO