NYPL-Simplified / server_core

Shared data model and utilities for Library Simplified server applications
7 stars 11 forks source link

Fix UnicodeDecodeError in Identifier.__repr__ and LicensePoolDeliveryMechanism.__repr__ #1240

Closed vbessonov closed 3 years ago

vbessonov commented 3 years ago

Description

This PR fixes Unicode problems in Identifier.__repr__ and LicensePoolDeliveryMechanism.__repr__.

Motivation and Context

There are two problems:

  1. [Mixing Unicode strings and byte literals|https://www.azavea.com/blog/2014/03/24/solving-unicode-problems-in-python-2-7/] in Identifier.__repr__ . This works only when byte strings contain the first half of the ASCII table (the first 128 characters).

For example, this line works:

u'Test: %s' % 'a'

But this one throws UnicodeDecodeError:

u'Test: %s' % 'ą'

To make sure this code works we have to use six.ensure_text (or 'u' prefix for literals) which make sure that the byte string is converted into a Unicode object using utf-8 encoding.

u'Test: %s' % six.ensure_text('ą')
  1. Plugging Unicode strings into a byte string in LicensePoolDeliveryMechanism.__repr__. It also works only when Unicode strings contain only ASCII symbols, otherwise a UnicodeDecodeError is thrown when Python tries implicitly convert Unicode objects into byte strings.

NOTE: Excessive use of native_string, six.ensure_text may slow down the system a bit.

How Has This Been Tested?

Checklist: