I was trying to run python 1_extract_mimic3.py and I encountered this error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 18312: invalid continuation byte.
File "/Users/xinyuezhang/BlendedICU/database_processing/dataprocessor.py", line 36, in __init__
self.ohdsi_med = self._read_json(self.med_file)
Environment
Code: latest version
Environment: latest version specified by env.yml
Possible Reason
JSON file (medications_v10.json) the function trying to load is not encoded in UTF-8, which is the default encoding expected by Python's json.load function.
Possible Solution
Solution 1
encode medications_v10.json in UTF-8
Solution 2
Revise _read_json function to include automatic encoding detection using the chardet library. This function first opens the file in binary mode ('rb') to detect the encoding using chardet. Then it re-opens the file in text mode with the correct encoding to parse the JSON.
import json
import chardet
def _read_json(self, pth):
# Detect encoding
with open(pth, 'rb') as file:
detected_encoding = chardet.detect(file.read())['encoding']
# Now read the file with the detected encoding
with open(pth, 'r', encoding=detected_encoding) as file:
return json.load(file)
For some reason I have no trouble running json.load(file) when the file is encoded in ISO-8859-1... Anyways I like solution 2 and just committed it.
Thank you for your interest in our work !
Hi, thanks very much for your work and your code!
Error
I was trying to run
python 1_extract_mimic3.py
and I encountered this error:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 18312: invalid continuation byte
.Environment
Code: latest version Environment: latest version specified by env.yml
Possible Reason
JSON file (medications_v10.json) the function trying to load is not encoded in UTF-8, which is the default encoding expected by Python's json.load function.
Possible Solution
Solution 1
encode medications_v10.json in UTF-8
Solution 2
Revise _read_json function to include automatic encoding detection using the chardet library. This function first opens the file in binary mode ('rb') to detect the encoding using chardet. Then it re-opens the file in text mode with the correct encoding to parse the JSON.