dogsheep / healthkit-to-sqlite

Convert an Apple Healthkit export zip to a SQLite database
https://datasette.io/tools/healthkit-to-sqlite
Apache License 2.0
200 stars 10 forks source link

UNIQUE constraint failed: workouts.id #14

Open n8henrie opened 3 years ago

n8henrie commented 3 years ago

I'm getting an error on my initial attempt to import data:

$ healthkit-to-sqlite 20201119\ healthkit\ export.zip healthkit.db
Importing from HealthKit  [###################################-]   98%  00:00:01
Traceback (most recent call last):
  File "venv/bin/healthkit-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "venv/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "venv/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "venv/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "venv/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "venv/lib/python3.9/site-packages/healthkit_to_sqlite/cli.py", line 57, in cli
    convert_xml_to_sqlite(fp, db, progress_callback=bar.update, zipfile=zf)
  File "venv/lib/python3.9/site-packages/healthkit_to_sqlite/utils.py", line 34, in convert_xml_to_sqlite
    workout_to_db(el, db, zipfile)
  File "venv/lib/python3.9/site-packages/healthkit_to_sqlite/utils.py", line 57, in workout_to_db
    pk = db["workouts"].insert(record, alter=True, hash_id="id").last_pk
  File "venv/lib/python3.9/site-packages/sqlite_utils/db.py", line 1660, in insert
    return self.insert_all(
  File "venv/lib/python3.9/site-packages/sqlite_utils/db.py", line 1778, in insert_all
    self.insert_chunk(
  File "venv/lib/python3.9/site-packages/sqlite_utils/db.py", line 1588, in insert_chunk
    result = self.db.execute(query, params)
  File "venv/lib/python3.9/site-packages/sqlite_utils/db.py", line 213, in execute
    return self.conn.execute(sql, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: workouts.id
n8henrie commented 3 years ago

This PR allows my import to succeed.

It looks like some events don't have an id, but do have HKExternalUUID (which gets turned into metadata_HKExternalUUID), so I use this as a fallback.

If a record has neither of these, I changed it to just print the record (for debugging) and return.

For some odd reason this ran fine at first, and now (after removing the generated db and trying again) I'm getting a different error (duplicate column name).

Looks like it may have run when I had two successive runs without remembering to delete the db in between. Will try to refactor.

n8henrie commented 3 years ago

Ok, new PR works. I'm not git enough so I just force-pushed over the old one.

I still end up with a lot of activities that are missing an id and therefore skipped (since this is used as the primary key). For example:

{'workoutActivityType': 'HKWorkoutActivityTypeRunning', 'duration': '35.31666666666667', 'durationUnit': 'min', 'totalDistance': '4.010870267636999', 'totalDistanceUnit': 'mi', 'totalEnergyBurned': '660.3516235351562', 'totalEnergyBurnedUnit': 'Cal', 'sourceName': 'Strava', 'sourceVersion': '22810', 'creationDate': '2020-07-16 13:38:26 -0700', 'startDate': '2020-07-16 06:38:26 -0700', 'endDate': '2020-07-16 07:13:45 -0700'}

I also end up with some unhappy characters (in the skipped events), such as: 'sourceName': 'Nathan’s Apple\xa0Watch',.

But it's successfully making it through the file, and the resulting db opens in datasette, so I'd call that progress.

lchski commented 2 years ago

I have this issue, too, with a fresh export. None of my Workout entries in export.xml have an id key, though the sample export.xml in the tests folder doesn’t either, so I don’t think this is the culprit. Indeed, it seems @simonw is using the hash_id function from sqlite_utils, which creates a column (id, in this case) based on a hash of the row’s contents.

When I run the script, a workouts table is created, with one entry: my first workout. No workout_points table is created, as I’d expect from utils.py. I then get essentially the same error as noted in this thread:

Traceback (most recent call last):
  File "/Users/lchski/.pyenv/versions/3.10.3/bin/healthkit-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/healthkit_to_sqlite/cli.py", line 57, in cli
    convert_xml_to_sqlite(fp, db, progress_callback=bar.update, zipfile=zf)
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/healthkit_to_sqlite/utils.py", line 34, in convert_xml_to_sqlite
    workout_to_db(el, db, zipfile)
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/healthkit_to_sqlite/utils.py", line 57, in workout_to_db
    pk = db["workouts"].insert(record, alter=True, hash_id="id").last_pk
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/sqlite_utils/db.py", line 2822, in insert
    return self.insert_all(
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/sqlite_utils/db.py", line 2950, in insert_all
    self.insert_chunk(
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/sqlite_utils/db.py", line 2715, in insert_chunk
    result = self.db.execute(query, params)
  File "/Users/lchski/.pyenv/versions/3.10.3/lib/python3.10/site-packages/sqlite_utils/db.py", line 458, in execute
    return self.conn.execute(sql, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: workouts.id

Are there maybe duplicate workouts in the data, which’d cause multiple rows to share the same id? It’s strange, though, that no workout_points is created at all. Export created from iOS 15.3.1.

lchski commented 2 years ago

Update: this appears to be because of running the command twice without clearing the DB in between. Tries to insert a Workout that already exists, causing a collision on the (auto-generated) id column. Had a different error with a clean DB, likely due to the workout points format; will make a new issue for that.

philipp-heinrich commented 1 year ago

@simonw any chance to get this fixed soon?