Closed Phil1717 closed 7 years ago
Hi Phil Apologies for the slow reply. I've not been Are you able to share the full stack trace so that I can track down the offending portion of the code? Thanks, John
I traced it down to line 174 in harvest.py
I figure that as it is, it should fail everytime it encounters UTF-8 data in the fetched data.
I tried rebuilding the project with the codecs module imported and using:
with codecs.open(fp, 'w', 'UTF-8') as fh:
fh.write(metadata)
But got this error: LookupError: setuptools-scm was unable to detect version for '/home/phil/Downloads/oai-harvest-develop'.
I ended up coding a rudimentary OAIPMH client with urllib, it's less than perfect though. I would love to go back to using yours if you end up adding this UTF8 fix.
Are you able to share the OAI-PMH source that showed up the error?
Of course: http://api.openaire.eu/oai_pmh
Good morning John,
I am using your project to fetch OAI-PMH data and I encounter this problem. It manages to pull about 200k entries and goes down on a single one systematically with this error:
ERROR 'ascii' codec can't encode character u'\xfc' in position 10: ordinal not in range(128)
I don't see that there are any options for me to deal with encoding issues. Otherwise filtering it out would be counter productive but I would happily just have the script skip these delinquent entries if they can't be transliterated.
Do you have any ideas?
Thank you for your time and your project, Phil