acoustid / mbdata

MusicBrainz SQLAlchemy Models
MIT License
97 stars 23 forks source link

Import fails #49

Closed lincoln017 closed 1 year ago

lincoln017 commented 2 years ago

Trying to build a standalone database (no replication) with the June 1 extracts and get a psycopg2.errors.BadCopyFileFormat error - see output / error messages below. I'm assuming this is related to the schema upgrade last month, but it's not clear how or where to retrieve the latest / correct sql. I would have assumed an update to the mbdata package would correspond to the schema changes, but no luck. I'm not running the server or docker - just want to play around with the database. Any pointers / advice much appreciated. Thanks.

mbslave import mbdump.tar.bz2 INFO:mbdata.replication:Importing data from mbdump.tar.bz2 INFO:mbdata.replication:Loading alternative_release_type to musicbrainz.alternative_release_type INFO:mbdata.replication:Loading area to musicbrainz.area INFO:mbdata.replication:Loading area_alias to musicbrainz.area_alias INFO:mbdata.replication:Loading area_alias_type to musicbrainz.area_alias_type INFO:mbdata.replication:Loading area_gid_redirect to musicbrainz.area_gid_redirect INFO:mbdata.replication:Loading area_type to musicbrainz.area_type INFO:mbdata.replication:Loading artist to musicbrainz.artist INFO:mbdata.replication:Loading artist_alias to musicbrainz.artist_alias INFO:mbdata.replication:Loading artist_alias_type to musicbrainz.artist_alias_type INFO:mbdata.replication:Loading artist_credit to musicbrainz.artist_credit Traceback (most recent call last): File "/Users/me/dev/miniconda3/bin/mbslave", line 8, in sys.exit(main()) File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 607, in main args.func(config, args) File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 256, in mbslave_import_main load_tar(filename, db, config, config.schemas.ignored_schemas, config.tables.ignored_tables) File "/Users/me/dev/miniconda3/lib/python3.8/site-packages/mbdata/replication.py", line 248, in load_tar cursor.copy_from(tar.extractfile(member), fulltable) psycopg2.errors.BadCopyFileFormat: extra data after last expected column CONTEXT: COPY artist_credit, line 1: "2152096 The Chats 1 202 2018-01-26 11:59:06.33519+00 0 33fbf1e4-4768-30cc-a5c6-1c72f4f45826"

amCap1712 commented 2 years ago

This is likely related to the Q2 2022 upgrade of MB schema. It adds a gid column to the artist_credit table. The changes to support it in mbdata were merged recently. However, a release with those changes is currently not available on pypi. Until a new release is done, you can try installing from the source, something like pip install mbdata@git+https://github.com/acoustid/mbdata.git@bbe303865e4cec3f83a65ce29f0d3468c729173e.

KazimirPodolski commented 2 years ago

This way is probably better:

pip install https://github.com/acoustid/mbdata/archive/bbe303865e4cec3f83a65ce29f0d3468c729173e.zip

as you don't need git in your docker container (if applicable) and you don't need to clone the whole repo (as advised here).