avniproject / avni-infra

Other
0 stars 0 forks source link

Migrate Superset persistence from sqlLite to Postgres #36

Open himeshr opened 3 months ago

himeshr commented 3 months ago

Migrate Superset persistence from sqlLite to Postgres DB on Prod RDS.

Options to migrate:

  1. Setup empty psql db for version 4.0.1 and Copy over data in CSV format, seems most feasible. Refer https://medium.com/@aaronbannin/migrating-superset-to-postgres-63d2c96c5102
  2. Convert sqlLite file to psql. Is tedious and its not guaranteed that the output superset DB would actually be usable by superset app correctly at all times. Refer https://stackoverflow.com/questions/4581727/how-to-convert-sqlite-sql-dump-file-to-postgresql
  3. Using import export Reference
vedfordev commented 2 months ago

As discussed with @himeshr Followed 3rd approach

Currently working on assets export

  1. go to api file
    cd /home/superset/superset/lib/python3.8/site-packages/superset/importexport
  2. change api.py file. comment attachment_filename variable and download_name=filename, in send_file function
  3. call /api/v1/assets/export/ and get data

Outcome and further process

  1. we are getting (databases, datasets, charts, dashboards, saved queries) in export
  2. need to find way to get users, roles & row level security
vedfordev commented 2 months ago

Steps to do :

  1. create db with superset 2.0.1
  2. migrate data with pgloader
  3. create docker for superset 4.0.1

Issue identify in pgloader

  1. giving error to convert binary(14) to uuid solution:

    • go to sql lite db and find the tables which have binary(14) uuid
      
      -- to get column which is used in uuid

    SELECT tbl_name FROM sqlite_master WHERE type='table' AND EXISTS ( SELECT 1 FROM pragma_table_info(tbl_name) WHERE name = 'uuid' );

    
    - created below config file
    ```conf
    LOAD DATABASE
     FROM sqlite://superset.db
     INTO postgresql://superset_user:superset_password@localhost:5435/superset2

WITH data only, reset sequences, truncate, concurrency = 1, batch rows = 500, prefetch rows = 1000

CAST column sql_metrics.uuid to uuid drop typemod using byte-vector-to-hexstring, column clusters.uuid to uuid drop typemod using byte-vector-to-hexstring, column columns.uuid to uuid drop typemod using byte-vector-to-hexstring, column metrics.uuid to uuid drop typemod using byte-vector-to-hexstring, column dashboard_email_schedules.uuid to uuid drop typemod using byte-vector-to-hexstring, column slice_email_schedules.uuid to uuid drop typemod using byte-vector-to-hexstring, column saved_query.uuid to uuid drop typemod using byte-vector-to-hexstring, column report_execution_log.uuid to uuid drop typemod using byte-vector-to-hexstring, column table_columns.uuid to uuid drop typemod using byte-vector-to-hexstring, column dashboards.uuid to uuid drop typemod using byte-vector-to-hexstring, column datasources.uuid to uuid drop typemod using byte-vector-to-hexstring, column dbs.uuid to uuid drop typemod using byte-vector-to-hexstring, column slices.uuid to uuid drop typemod using byte-vector-to-hexstring, column tables.uuid to uuid drop typemod using byte-vector-to-hexstring, column embedded_dashboards.uuid to uuid drop typemod using byte-vector-to-hexstring, column sl_datasets.uuid to uuid drop typemod using byte-vector-to-hexstring, column sl_tables.uuid to uuid drop typemod using byte-vector-to-hexstring, column sl_columns.uuid to uuid drop typemod using byte-vector-to-hexstring, column key_value.uuid to uuid drop typemod using byte-vector-to-hexstring

SET work_mem TO '128MB', maintenance_work_mem TO '512MB';

----
1. create network for superset and postgres and run postgres
```docker
   docker network create superset_network
   docker run -d \
    -p 5435:5432 \
    --name postgres_db \
    --network superset_network \
    -e POSTGRES_USER=superset_user \
    -e POSTGRES_PASSWORD=superset_password \
    -e POSTGRES_DB=superset_db \
    postgres
  1. create db superset2

  2. create container of superset 2.0.1 and with below command

    docker run -d -p 8088:8088  --network superset_network  -e "SUPERSET_SECRET_KEY=abc" --name superset_2.0.1 apache/superset:2.0.1
  3. go to container and add path to postgres db and restart the container

    docker exec -it  -u root superset_2.0.1 bash
    echo "SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://superset_user:superset_password@postgres_db:5432/superset2'" > superset_config.py
    SUPERSET_CONFIG_PATH=/app/superset_config.py
  4. go to superset container and run below command

    superset db upgrade
    superset init

6.run command for pgloader and check the error

pgloader pgloaderdataonly.conf > error.txt
vedfordev commented 2 months ago

move card to qa ready. implementation team will do sanity test. @himeshr @pkundu