IQSS / dataverse-sample-data

Scripts and sample data for demo purposes
6 stars 11 forks source link

(Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) #21

Open atc0005 opened 4 years ago

atc0005 commented 4 years ago

As noted on GlobalDataverseCommunityConsortium/dataverse-ansible#38, I encountered the following error first when running the Ansible playbook from that repo, then again when following the steps in this repo's README file.

Snippet of the output just prior and then the error message:

Creating dataverse ubiquity-press.json in dataverse :root
{'name': 'Ubiquity Press Dataverse', 'alias': 'ubiquity-press', 'dataverseContacts': [{'contactEmail': 'ubiquity-press@mailinator.com'}], 'affiliation': '', 'description': 'Ubiquity Press is an open access publisher of peer-reviewed, academic journals. Our flexible publishing model makes journals affordable, and enables researchers around the world to find and access the information they need, without barriers. The following gives an overview of how we work. More information can be found in a recent interview with Chronicle of Higher Education: <a href="http://chronicle.com/blogs/profhacker/ubiquity/43312" rel="nofollow" target="_blank">"Open Access Ahoy: An Interview with Ubiquity Press"</a>.', 'dataverseType': 'JOURNALS'}
Dataverse ubiquity-press created.
<Response [201]>
Dataverse ubiquity-press published.
<Response [200]>
Creating dataverse jopd.json in dataverse ubiquity-press
{'name': 'Journal of Open Psychology Data (JOPD) Dataverse', 'alias': 'jopd', 'dataverseContacts': [{'contactEmail': 'jopd@mailinator.com'}], 'affiliation': 'Ubiquity Press', 'description': 'Datasets from data papers published in the Journal of Open Psychology Data (JOPD).', 'dataverseType': 'JOURNALS'}
Dataverse jopd created.
<Response [201]>
Dataverse jopd published.
<Response [200]>
Creating dataset flynn-effect-in-estonia.json in dataverse jopd
Traceback (most recent call last):
  File "create_sample_data.py", line 56, in <module>
    metadata = json.load(f)
  File "/usr/lib64/python3.6/json/__init__.py", line 296, in load
    return loads(fp.read(),
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128)

The environment is a CentOS 7 x64 LXD container. I attempted to replicate within a local CentOS 7 x64 VM, but my (unfortunately remote) VMware Workstation environment is acting up. I'll attempt to further replicate in a non-LXD environment when I have more time.

qqmyers commented 4 years ago

FWIW, I think adding ", encoding='utf-8') " to the open calls right before the json.load statements would work, but I just had a similar situation in dataverse-metrics and it turned out I was able to read the unicode in python 3 but not python 2, so there must also be some environment variable (or module?) that can be set (which would explain why this hasn't been seen by others?)

pdurbin commented 4 years ago

@qqmyers thanks for the tip about the Python version.

@atc0005 which version of Python was used above, please?

donsizemore commented 4 years ago

@pdurbin he first hit the bug using dataverse-ansible, which installs 3.6: https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/master/tasks/sampledata.yml#L16

atc0005 commented 4 years ago

@pdurbin: which version of Python was used above, please?

What @donsizemore said. Please let me know if you need more info.

djbrooke commented 3 years ago

Thanks all for the details here. I'm going to get this into a sprint so that we can get it fixed.

djbrooke commented 3 years ago
atc0005 commented 3 years ago
  • This could be a python version mis-match - consider asking/telling people to use python 3

If it helps, I believe that I was using Python 3.6 at the time I encountered the issue. The error snippet in the OP suggests this, but it's been long enough since my attempt to load the sample data that I don't recall for sure.