KBNLresearch / iromlab

Loader software for automated imaging of optical media with Nimbie disc robot
Apache License 2.0
31 stars 5 forks source link

UnicodeEncodeError: 'charmap' codec can't encode character / character maps to <undefined> #55

Closed bitsgalore closed 7 years ago

bitsgalore commented 7 years ago

While processing Sranan Rutu DVD (PPN=344588807), following error to stdout:

--- Logging error ---
Traceback (most recent call last):
  File "C:\Python36\lib\logging\__init__.py", line 994, in emit
    stream.write(msg)
  File "C:\Python36\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0301' in position
93: character maps to <undefined>
Call stack:
  File "C:\Python36\lib\threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "C:\Python36\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Python36\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "f:\johan\pythonCode\iromlab\iromlab\cdworker.py", line 508, in cdWorker
    success = processDisc(carrierData)
  File "f:\johan\pythonCode\iromlab\iromlab\cdworker.py", line 107, in processDi
sc
    logging.info(''.join(['Title: ',carrierData['title']]))
Message: "Title: Sranan Rutu : duizenden pagina's naslagmateriaal op één schij
fje, uitgegeven ter gelegenheid van het 10-jarig bestaan van de Stichting voor S
urinaamse Genealogie, 2001-2011 / Stichting voor Surinaamse Genealogie"
Arguments: ()

Error seems to happen in logger.

bitsgalore commented 7 years ago

Same error, different disc:

--- Logging error ---
Traceback (most recent call last):
  File "C:\Python36\lib\logging\__init__.py", line 994, in emit
    stream.write(msg)
  File "C:\Python36\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0300' in position
186: character maps to <undefined>
Call stack:
  File "C:\Python36\lib\threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "C:\Python36\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Python36\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "f:\johan\pythonCode\iromlab\iromlab\cdworker.py", line 508, in cdWorker
    success = processDisc(carrierData)
  File "f:\johan\pythonCode\iromlab\iromlab\cdworker.py", line 107, in processDi
sc
    logging.info(''.join(['Title: ',carrierData['title']]))
Message: 'Title: The audio collection. III : more than 75 audio designs for home
 construction = mehr als 75 Audio-Selbstbauschaltungen = plus de 75 projets audi
o à réaliser soi-même = meer dan 75 audio-ontwerpen voor zelfbouw'
Arguments: ()
bitsgalore commented 7 years ago

Minimal script that replicates the issue:

https://gist.github.com/bitsgalore/a7b18aaf08ecbdb3391d5a9fa8d1ce9a

Apparently the error arises when writing strings with non-ASCII UTF-8 chars to the log file. Writing to the GUI window works OK.

Solution would be to add encoding parameter to the handler:

http://stackoverflow.com/questions/10706547/add-encoding-parameter-to-logging-basicconfig

bitsgalore commented 7 years ago

This seems to work, old code:

    logging.basicConfig(filename='test.log',
        level=logging.INFO, 
        format='%(asctime)s - %(levelname)s - %(message)s')

Replace by this:

    logging.basicConfig(handlers=[logging.FileHandler('test.log', 'w', 'utf-8')], 
                level=logging.INFO,
                format='%(asctime)s - %(levelname)s - %(message)s')
bitsgalore commented 7 years ago

Fixed: https://github.com/KBNLresearch/iromlab/commit/b2b7569d975f8ff65e2be3902a2f15a849530e25