fieldmuseum / EMu-xml-to-json

A script to convert EMu XML exports to JSON
Other
2 stars 0 forks source link

XML validation #2

Closed peteherbst closed 2 years ago

peteherbst commented 3 years ago

Please take a look at these resources:

magpiedin commented 2 years ago

[from Pete]

if you haven't run into a solution yet for your encoding issue on the XML...

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

encode to bytes and decode into your UTF-8 ?

Default string encoding for python 3 is utf-8 (which is good). https://docs.python.org/3/howto/unicode.html#the-string-type

magpiedin commented 2 years ago

Currently:

2021-07-13 08:11:50.340441 - File Input: /home/data/2021-6-24/xml50704-195636.000000 : Error : not well-formed (invalid token): line 2579, column 50

(decision - that suffices)