metadata output not exactly in utf8 encoding...

metadata output seems to be in ascii with other unicode characters encoded as numerical character entities. Legal for default utf8 encoding, as ascii is a subset, but this is not what I, and I think most people want or expect. ( This may be the same issue reported as #32 . This was also reported to me by Columbia.edu and I was able to reproduce it on both my and their OAI feeds. )

I initially tried adding encoding="UTF-8" to etree.tostring call in metadata.py but this worked under python3.x, but failed under python2.x .

adding encoding="unicode" appears to be the correct fix that seems to work under both python2.x and python3.x .

Under python2.x , encoding="UTF-8" returns a <type "str"> that contains unicode characters, which then may give an error when coercing to <type "unicode"> . encoding="unicode" returns <type "unicode"> .

See: https://github.com/sdm7g/oai-harvest/blob/fix-pyoai/oaiharvest/metadata.py#L51-L53

bloomonkey / oai-harvest

metadata output not exactly in utf8 encoding... #33