The exception below shows up. You'll notice the warning stating that "Windows-1252" was assumed. If I go to io.py and change this line to force "utf-8" as the encoding the file loads just fine. Is there another way to change the encoding of the file I'm loading?
/var/folders/4b/gklb4t292nq0vyg08x59gjzc0000gn/T/ipykernel_29628/2923417831.py:1: UnicodeWarning: unsound encoding, assuming Windows-1252 (73% confidence)
Ontology('/Users/yoshiki/Downloads/opmi-merged.owl')
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
Cell In[188], line 1
----> 1 Ontology('/Users/yoshiki/Downloads/opmi-merged.owl')
File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/ontology.py:283, in Ontology.__init__(self, handle, import_depth, timeout, threads)
281 for cls in BaseParser.__subclasses__():
282 if cls.can_parse(typing.cast(str, self.path), buffer):
--> 283 cls(self).parse_from(_handle) # type: ignore
284 break
285 else:
File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/parsers/rdfxml.py:84, in RdfXMLParser.parse_from(self, handle, threads)
82 def parse_from(self, handle, threads=None):
83 # Load the XML document into an XML Element tree
---> 84 tree: etree.ElementTree = etree.parse(handle)
86 # Load metadata from the `owl:Ontology` element
87 owl_ontology = tree.find(_NS["owl"]["Ontology"])
File ~/miniconda3/envs/db/lib/python3.11/xml/etree/ElementTree.py:1218, in parse(source, parser)
1209 """Parse XML document into element tree.
1210
1211 *source* is a filename or file object containing XML data,
(...)
1215
1216 """
1217 tree = ElementTree()
-> 1218 tree.parse(source, parser)
1219 return tree
File ~/miniconda3/envs/db/lib/python3.11/xml/etree/ElementTree.py:580, in ElementTree.parse(self, source, parser)
574 parser = XMLParser()
575 if hasattr(parser, '_parse_whole'):
576 # The default XMLParser, when it comes from an accelerator,
577 # can define an internal _parse_whole API for efficiency.
578 # It can be used to parse the whole source without feeding
579 # it with chunks.
--> 580 self._root = parser._parse_whole(source)
581 return self._root
582 while True:
File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/utils/io.py:24, in BufferedReader.read(self, size)
22 def read(self, size: Optional[int] = -1) -> bytes:
23 try:
---> 24 return super(BufferedReader, self).read(size)
25 except ValueError:
26 if typing.cast(io.BufferedReader, self.closed):
File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/utils/io.py:60, in EncodedFile.readinto(self, buffer)
59 def readinto(self, buffer: ByteString) -> int:
---> 60 chunk = self.read(len(buffer) // 2)
61 typing.cast(bytearray, buffer)[: len(chunk)] = chunk
62 return len(chunk)
File ~/miniconda3/envs/db/lib/python3.11/site-packages/pronto/utils/io.py:56, in EncodedFile.read(self, size)
55 def read(self, size: Optional[int] = -1) -> bytes:
---> 56 chunk = super().read(-1 if size is None else size)
57 return chunk.replace(b"\r\n", b"\n")
File <frozen codecs>:814, in read(self, size)
File <frozen codecs>:507, in read(self, size, chars, firstline)
File ~/miniconda3/envs/db/lib/python3.11/encodings/cp1252.py:15, in Codec.decode(self, input, errors)
14 def decode(self,input,errors='strict'):
---> 15 return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29335: character maps to <undefined>
To reproduce download OWL formatted OPMI from BioPortal.
The exception below shows up. You'll notice the warning stating that "Windows-1252" was assumed. If I go to io.py and change this line to force "utf-8" as the encoding the file loads just fine. Is there another way to change the encoding of the file I'm loading?