metadata output seems to be in ascii with other unicode characters encoded as numerical character entities. Legal for default utf8 encoding, as ascii is a subset, but this is not what I, and I think most people want or expect.
( This may be the same issue reported as #32 . This was also reported to me by Columbia.edu and I was able to reproduce it on both my and their OAI feeds. )
I initially tried adding encoding="UTF-8" to etree.tostring call in metadata.py but this worked under python3.x, but failed under python2.x .
adding encoding="unicode" appears to be the correct fix that seems to work under both python2.x and python3.x .
Under python2.x , encoding="UTF-8" returns a <type "str"> that contains unicode characters, which then may give an error when coercing to <type "unicode"> . encoding="unicode" returns <type "unicode"> .
metadata output seems to be in ascii with other unicode characters encoded as numerical character entities. Legal for default utf8 encoding, as ascii is a subset, but this is not what I, and I think most people want or expect. ( This may be the same issue reported as #32 . This was also reported to me by Columbia.edu and I was able to reproduce it on both my and their OAI feeds. )
I initially tried adding
encoding="UTF-8"
to etree.tostring call in metadata.py but this worked under python3.x, but failed under python2.x .adding
encoding="unicode"
appears to be the correct fix that seems to work under both python2.x and python3.x .Under python2.x ,
encoding="UTF-8"
returns a<type "str">
that contains unicode characters, which then may give an error when coercing to<type "unicode">
.encoding="unicode"
returns<type "unicode">
.See: https://github.com/sdm7g/oai-harvest/blob/fix-pyoai/oaiharvest/metadata.py#L51-L53