dogsheep / healthkit-to-sqlite

Convert an Apple Healthkit export zip to a SQLite database
https://datasette.io/tools/healthkit-to-sqlite
Apache License 2.0
191 stars 9 forks source link

DOC: xml.etree.ElementTree.ParseError due to healthkit version 12 #24

Open mmngreco opened 1 year ago

mmngreco commented 1 year ago

Hi @simonw

I hope you find this issue ok, the idea is provide some documentation to other users like me about how to solve this problem and save some time.

Following the instructions from the README.md I've faced this error:

(venv) mgreco@pop-os apple-health master* (23:44|0s)
$ healthkit-to-sqlite apple_health_export/export.xml healthkit.db --xml
Importing from HealthKit  [------------------------------------]    0%
Traceback (most recent call last):
  File "/home/mgreco/github/mmngreco/apple-health/venv/bin/healthkit-to-sqlite", line 33, in <module>
    sys.exit(load_entry_point('healthkit-to-sqlite', 'console_scripts', 'healthkit-to-sqlite')())
  File "/home/mgreco/github/mmngreco/apple-health/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/mgreco/github/mmngreco/apple-health/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/mgreco/github/mmngreco/apple-health/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mgreco/github/mmngreco/apple-health/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/mgreco/github/mmngreco/apple-health/.deps/healthkit-to-sqlite/healthkit_to_sqlite/cli.py", line 57, in cli
    convert_xml_to_sqlite(fp, db, progress_callback=bar.update, zipfile=zf)
  File "/home/mgreco/github/mmngreco/apple-health/.deps/healthkit-to-sqlite/healthkit_to_sqlite/utils.py", line 25, in convert_xml_to_sqlite
    for tag, el in find_all_tags(
  File "/home/mgreco/github/mmngreco/apple-health/.deps/healthkit-to-sqlite/healthkit_to_sqlite/utils.py", line 12, in find_all_tags
    for event, el in parser.read_events():
  File "/home/mgreco/github/mmngreco/apple-health/venv/lib/python3.10/xml/etree/ElementTree.py", line 1324, in read_events
    raise event
  File "/home/mgreco/github/mmngreco/apple-health/venv/lib/python3.10/xml/etree/ElementTree.py", line 1296, in feed
    self._parser.feed(data)
xml.etree.ElementTree.ParseError: syntax error: line 156, column 0

So, after debugging and searching on internet I found this useful link: https://discussions.apple.com/thread/254202523 (etresoft, the real hero). Which basically says that the xml given by the health app (healthkit version 12) has some bugs but fortunately, they can be solved with a couple of commads:

  1. Uncompress the zip and move the new folder where export.xml is.
  2. Create a patch.txt with the following content

    --- export.xml  2022-09-18 15:17:09.000000000 -0400
    +++ export-fixed.xml    2022-09-18 16:37:08.000000000 -0400
    @@ -15,6 +15,7 @@
       HKCharacteristicTypeIdentifierBiologicalSex       CDATA #REQUIRED
       HKCharacteristicTypeIdentifierBloodType           CDATA #REQUIRED
       HKCharacteristicTypeIdentifierFitzpatrickSkinType CDATA #REQUIRED
    +  HKCharacteristicTypeIdentifierCardioFitnessMedicationsUse CDATA #IMPLIED
     >
     <!ELEMENT Record ((MetadataEntry|HeartRateVariabilityMetadataList)*)>
     <!ATTLIST Record
    @@ -39,7 +40,7 @@
       startDate     CDATA #REQUIRED
       endDate       CDATA #REQUIRED
     >
    -<!ELEMENT Workout ((MetadataEntry|WorkoutEvent|WorkoutRoute)*)>
    +<!ELEMENT Workout ((MetadataEntry|WorkoutEvent|WorkoutRoute|WorkoutStatistics)*)>
     <!ATTLIST Workout
       workoutActivityType   CDATA #REQUIRED
       duration              CDATA #IMPLIED
    @@ -63,7 +64,7 @@
       duration             CDATA #IMPLIED
       durationUnit         CDATA #IMPLIED
     >
    -<!ELEMENT WorkoutEvent EMPTY>
    +<!ELEMENT WorkoutEvent (MetadataEntry?)>
     <!ATTLIST WorkoutEvent
       type                 CDATA #REQUIRED
       date                 CDATA #REQUIRED
    @@ -79,6 +80,7 @@
       minimum              CDATA #IMPLIED
       maximum              CDATA #IMPLIED
       sum                  CDATA #IMPLIED
    +  unit                 CDATA #IMPLIED
     >
     <!ELEMENT WorkoutRoute ((MetadataEntry|FileReference)*)>
     <!ATTLIST WorkoutRoute
    @@ -153,6 +155,7 @@
       dateIssued       CDATA #REQUIRED
       expirationDate   CDATA #REQUIRED
       brand            CDATA #IMPLIED
    +>
     <!ELEMENT RightEye EMPTY>
     <!ATTLIST RightEye
       sphere           CDATA #IMPLIED
    @@ -203,13 +206,6 @@
       diameter         CDATA #IMPLIED
       diameterUnit     CDATA #IMPLIED
     >
    -  device           CDATA #IMPLIED
    -<!ELEMENT MetadataEntry EMPTY>
    -<!ATTLIST MetadataEntry
    -  key              CDATA #IMPLIED
    -  value            CDATA #IMPLIED
    ->
    ->
     ]>
     <HealthData>
      <ExportDate/>
  3. Apply the path with the command: patch < patch.txt
  4. Fix endDates with the command sed 's/startDate/endDate/2' export.xml > export-fixed.xml
  5. Try again healthkit-to-sqlite export-fixed.xml healthkit.db --xml
Mjboothaus commented 1 year ago

Thanks for reporting this and providing a solution -- I was puzzled by this error when I revisited my walking data and experienced this issues. I haven't tried the fix yet.

Mjboothaus commented 1 year ago

@simonw - maybe put in some error handling to trap for poorly formed XML (from Apple engineers) so that it suggests that there are problems with export.zip rather than odd looking Python errors :)