Qovery / Replibyte

Seed your development database with real data ⚡️
https://www.replibyte.com
GNU General Public License v3.0
4.16k stars 128 forks source link

Unable to transform data on a simple use case #227

Closed cterence closed 2 years ago

cterence commented 2 years ago

Hello, I have been testing Replibyte and its features of data transformation, however, I have yet to successfully transform data, here is my setup :

Replibyte version : 0.9.7

docker-compose.yml

services:
  replibyte:
    container_name: replibyte
    image: ghcr.io/qovery/replibyte:0.9.7
    entrypoint: /script.sh
    volumes:
      - ./config:/config
      - ./script.sh:/script.sh
    depends_on:
      - postgresql-prod
      - postgresql-staging

  postgresql-prod:
    container_name: postgresql-prod
    image: postgres:11
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=replibyte
    ports:
      - 5433:5432
    volumes:
      - ./pgsql-sample-data/dellstore2-normal-1.0:/docker-entrypoint-initdb.d

  postgresql-staging:
    container_name: postgresql-staging
    image: postgres:11
    ports:
      - 5432:5432
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=replibyte

script.sh

#!/bin/bash

set -e

mkdir /datastore
sleep 5
echo -e "Beginning dump creation"
./replibyte --config /config/replibyte.yaml dump create
echo -e "Beginning restore"
./replibyte --config /config/replibyte.yaml dump restore remote --value latest

replibyte.yaml in the /config directory

source:
  connection_uri: postgres://postgres:postgres@postgresql-prod:5432/replibyte
  transformers:
    - database: replibyte
      table: categories
      columns:
        - name: categoryname
          transformer_name: random
    - database: replibyte
      table: products
      columns:
        - name: title
          transformer_name: random
        - name: actor
          transformer_name: email
    - database: replibyte
      table: customers
      columns:
        - name: email
          transformer_name: email

datastore:
  local_disk:
    dir: /datastore

destination:
  connection_uri: postgres://postgres:postgres@postgresql-staging:5432/replibyte

When I run docker compose up, the sample database is fed into the prod postgresql database and replibyte makes the dump and restores it to the staging database immediately.

Here are the script's logs :

replibyte           | Beginning dump creation
replibyte           | Dump created successfully!
replibyte           | Beginning restore
replibyte           | Restore successful!

When I select rows from the staging database, the columns from the tables supposed to be transformed are still the same as the prod database.

For example :

❯ PGPASSWORD=postgres psql -h localhost -p 5433 -U postgres -d replibyte -c 'select * from products limit 5'
 prod_id | category |       title        |      actor       | price | special | common_prod_id 
---------+----------+--------------------+------------------+-------+---------+----------------
       1 |       14 | ACADEMY ACADEMY    | PENELOPE GUINESS | 25.99 |       0 |           1976
       2 |        6 | ACADEMY ACE        | EWAN RICKMAN     | 20.99 |       0 |           6289
       3 |        6 | ACADEMY ADAPTATION | VIVIEN KAHN      | 28.99 |       0 |           7173
       4 |        3 | ACADEMY AFFAIR     | ALAN MARX        | 14.99 |       0 |           8042
       5 |        3 | ACADEMY AFRICAN    | CARRIE HANNAH    | 11.99 |       1 |           2183
(5 rows)

❯ PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres -d replibyte -c 'select * from products limit 5'
 prod_id | category |       title        |      actor       | price | special | common_prod_id 
---------+----------+--------------------+------------------+-------+---------+----------------
       1 |       14 | ACADEMY ACADEMY    | PENELOPE GUINESS | 25.99 |       0 |           1976
       2 |        6 | ACADEMY ACE        | EWAN RICKMAN     | 20.99 |       0 |           6289
       3 |        6 | ACADEMY ADAPTATION | VIVIEN KAHN      | 28.99 |       0 |           7173
       4 |        3 | ACADEMY AFFAIR     | ALAN MARX        | 14.99 |       0 |           8042
       5 |        3 | ACADEMY AFRICAN    | CARRIE HANNAH    | 11.99 |       1 |           2183
(5 rows)

where port 5433 is the prod database and port 5432 is the staging database, with supposedly transformed rows. The title and actor columns should be different if my replibyte config is correct.

The sample data can be found here.

Is there something I missed here ?

evoxmusic commented 2 years ago

Hi @cterence , I see that you are using the restore remote command while you use a local data store. Can you try by using the restore local command?

cterence commented 2 years ago

Hi @evoxmusic, I've tried doing so and there is no change, the data is not transformed when I query it in the postgresql docker container.

If it could help, here are some details :

cterence commented 2 years ago

Hi @evoxmusic, great news because I found the problem on my part. The replibyte.yaml config was wrong because in the database field of my transformers, I was putting the database name instead of the PostgreSQL schema name (which was public in my test case). With this, the transformers features works as intended.

However, two questions arise from this :

In any case, I'm closing the issue because my problem is solved.