acoustid / mbdata

MusicBrainz SQLAlchemy Models
MIT License
97 stars 23 forks source link

Build instructions that actually work. #43

Closed echoxxzz closed 1 year ago

echoxxzz commented 2 years ago

Make sure you're using Postgre 12 or later.

  1. Complete the first 3 steps to install mbdata and setup your mbslave.conf file.

    The SQL scripts in the mbdata repo haven't been updated in over 6 months so you need to either clone the Musicbrainz git repo or download the latest Musicbrainz zip file from Github and copy and/or exact the contents of the musicbrainz-server-master/admin/sql folder and replace the files in the mbdata/sql folder.

    If you used pipx the mbdata/sql folder will be located somewhere like the following:

    ~/.local/pipx/venvs/mbdata/lib/python3.x/site-packages/mbdata/sql

    If needed, use find to locate your specific folder location.

    The remainder of the steps where taken verbatim from the Musicbrainz InitDb.pl script. This will create a complete Musicbrainz slave database will all tables, indexes, etc. You may want to customize as needed if say for instance you're not importing wikidocs or other non-essential data dumps.

  2. Create the Database:

    sudo su - postgres createuser musicbrainz createdb -l C -E UNICODE -T template0 -O musicbrainz musicbrainz

    MB now use DateTime->now to populate 'TIMESTAMP WITH TIME ZONE' columns in their code.

    DateTime->now outputs 'floating' UTC by default, but doesn't encode any timezone info in its output, so the

    database must have its timezone set to UTC in order to correctly interpret those values.

    psql musicbrainz -c 'ALTER DATABASE musicbrainz SET timezone TO 'UTC';'

    psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS cube WITH SCHEMA public;' psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS earthdistance WITH SCHEMA public;' psql musicbrainz -c 'CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;'

    exit out of sudo back to the account you setup for mbdata

  3. Prepare empty schemas:

    echo 'CREATE SCHEMA musicbrainz;' | mbslave psql -S echo 'CREATE SCHEMA statistics;' | mbslave psql -S echo 'CREATE SCHEMA cover_art_archive;' | mbslave psql -S echo 'CREATE SCHEMA wikidocs;' | mbslave psql -S echo 'CREATE SCHEMA documentation;' | mbslave psql -S echo 'CREATE SCHEMA event_art_archive;' | mbslave psql -S echo 'CREATE SCHEMA json_dump;' | mbslave psql -S echo 'CREATE SCHEMA report;' | mbslave psql -S echo 'CREATE SCHEMA sitemaps;' | mbslave psql -S

  4. Create tables structures:

    The first script will give an error that the extensions already exist because we already added them in step 1. The

    extensions have to be created in step 1 as you need to be a superuser account to create extensions. We just need

    to run the Extensions.sql script to add the musicbrainz.ll_to_earth() function so index creation won't fail as reported

    in this issue ticket.

    mbslave psql -f Extensions.sql
    mbslave psql -f CreateCollations.sql mbslave psql -f CreateTables.sql mbslave psql -f caa/CreateTables.sql mbslave psql -f eaa/CreateTables.sql mbslave psql -f documentation/CreateTables.sql mbslave psql -f json_dump/CreateTables.sql mbslave psql -f report/CreateTables.sql mbslave psql -f sitemaps/CreateTables.sql mbslave psql -f statistics/CreateTables.sql mbslave psql -f wikidocs/CreateTables.sql

  5. Import the data dumps. Minimally you need the following two dump files:

    mbslave import mbdump.tar.bz2 mbdump-derived.tar.bz2

  6. Create the primary keys:

    mbslave psql -f CreatePrimaryKeys.sql mbslave psql -f caa/CreatePrimaryKeys.sql mbslave psql -f documentation/CreatePrimaryKeys.sql mbslave psql -f eaa/CreatePrimaryKeys.sql mbslave psql -f statistics/CreatePrimaryKeys.sql mbslave psql -f wikidocs/CreatePrimaryKeys.sql

  7. Create functions

    mbslave psql -f CreateSearchConfiguration.sql mbslave psql -f CreateFunctions.sql mbslave psql -f caa/CreateFunctions.sql mbslave psql -f eaa/CreateFunctions.sql mbslave psql -f CreateSlaveOnlyFunctions.sql

  8. Create indexes

    mbslave psql -f CreateIndexes.sql mbslave psql -f caa/CreateIndexes.sql mbslave psql -f eaa/CreateIndexes.sql mbslave psql -f json_dump/CreateIndexes.sql mbslave psql -f sitemaps/CreateIndexes.sql mbslave psql -f statistics/CreateIndexes.sql mbslave psql -f CreateSlaveIndexes.sql

  9. Set initial sequence values

    mbslave psql -f SetSequences.sql mbslave psql -f statistics/SetSequences.sql

  10. Create views and triggers and search indexes

    mbslave psql -f CreateViews.sql mbslave psql -f caa/CreateViews.sql mbslave psql -f eaa/CreateViews.sql mbslave psql -f CreateSlaveOnlyTriggers.sql mbslave psql -f CreateSearchIndexes.sql

gorgobacka commented 2 years ago

Thanks for this description. Unfortunately, I get a lot of "Skipping... table not found" issues while importing the data. I'm not sure, what I'm doing wrong.

echoxxzz commented 2 years ago

Are you sure got the latest SQL scripts from the Musicbrainz GIT repo and copied them to the appropriate folder replacing the ones that come with mbdata?

gorgobacka commented 2 years ago

I did it. But maybe I copied it to the wrong folder. I will check again.

echoxxzz commented 2 years ago

To be honest you don't really need to use the mbdata command to run the sql scripts. Just use psql and specify the database name as the last paramenter or as a input to -d.

e.g If your database is called Musicbrianz: psql -f CreatePrimaryKeys.sql Musicbrainz

Try that using the latest sql scripts then the import should work.

gorgobacka commented 2 years ago

@echoxxzz I think, I was using a too old postgresql version (9.5). I upgraded and not it seems to work,

echoxxzz commented 2 years ago

According to the Install.MD file on Github it requires 12 or later. I updated my instructions. Glad everything worked out for you.

jeleffto commented 2 years ago

When I run "mbslave psql -f CreateFunctions.sql" I get the following error:

psql:/var/folders/gd/ymqmmv7n1pnbrl303865ynbh0000gn/T/tmp3d4outcm.sql:23: ERROR: function array_append(anyarray, anyelement) does not exist

Any way around this? I did some digging and it seems like an issue with Postgresql 14+

kpmcc commented 2 years ago

deleted (no longer relevant)

amCap1712 commented 2 years ago

@jeleffto The MB SQL scripts were updated recently to make them compatible with PG 14. See https://github.com/metabrainz/musicbrainz-server/commit/38861c768d427970e9088cefc76604b05ee11bab. If you try installing mbdata from the latest master or update the scripts later manually, the error should go away.

echoxxzz commented 2 years ago

It seems like mbdata isn't being maintained anymore. Everything it does can be done using the Musicbrainz repo and it works on all supported versions of Postgresql.

Most of this info can be found in the Musicrainz Install.md file but I will go through the steps I used to setup a Musicbrainz mirrored database server with replication on Ubuntu Server 22.04.1 LTS.

Since I'm only using this server for Musicbrainz I didn't use local::lib for Perl modules and I trusted all local logins to Postgres as it just makes the Install and Mirror script easier to run.

  1. Install Postgresql:

    sudo apt install postgresql-14 postgresql-server-dev-14 postgresql-contrib

  2. Edit the /etc/postgresql/14/main/pg_hba.conf and change this line from:

    local all postgres peer

    to:

    local all all trust

    Restart the Postgresql server:

    sudo systemctl restart postgresql

    Verify you can login and list the system databases:

    psql -U postgres -l

  3. Install the build tools and dependencies:

    sudo apt install build-essential cpanminus libdb-dev libexpat1-dev zlib1g-dev moreutils pkg-config

  4. Clone the Musicbrainz repository:

    git clone https://github.com/metabrainz/musicbrainz-server.git cd musicbrainz-server

  5. Install the Perl dependencies:

    sudo cpanm --installdeps --notest .

    You should end up with 307 distributions installed.

  6. Copy and edit the sample DBDefs.pm:

    cp lib/DBDefs.pm.sample lib/DBDefs.pm nano lib/DBDefs.pm

    • Since we're configuring a Musicbrainz mirror server uncomment _REPLICATIONTYPE and change it from _RTSTANDALONE to RT_MIRROR.
    • Modify the READWRITE and READONLY database connection parameters. They must match as READWRITE is used during the intial import and READONLY is used during packet replication.
    • Uncomment and edit REPLICATION_ACCESS_TOKEN and enter your Musicbrainz API Key.
    • By default the replication script will try to use Redis for caching. Either setup a Redis server or you can disable Redis caching by uncommenting the sub CACHE_MANAGER_OPTIONS section that uses the Cache::Null class. You need to uncomment all 13 lines.
  7. Download the latest Musicbrainz database dump files. Minimal you need mbdump.tar.bz2 & mbdump-derived.tar.bz2.

  8. Create the database and import the dump files:

    ./admin/InitDb.pl --createdb --echo --import mbdump*.tar.bz2

  9. Create a cron job to update the database hourly. Musicbrainz ships with a shell script called called mirror.sh located in the admin/cron folder. To run the script ever hour enter the following:

    0 /home/username/musicbrainz-server/admin/cron/mirror.sh >/dev/null 2>&1

    into

    crontab -e

  10. Enjoy.