dolthub / doltgresql

DoltgreSQL - Version Controlled PostgreSQL
Apache License 2.0
1.07k stars 23 forks source link

Feature Request: Replication capability #75

Open deathemperor opened 9 months ago

deathemperor commented 9 months ago

Copied from original request:

I really want to share our intended use case with doltgres just so you know how interested I am in this project: we do claim management and processing for insurers and data changes are part of our core data structures. At any given time we need to allow claim handlers to view those data to make decisions as well as others to view audit logs. Such use case may not be so unique, except we're using GraphQL with Hasura (been using it in our org for 3 years now) which enables instant GraphQL from postgres. With the combination of those, we'll build something like this:

create table employees (
    id int8,
    last_name text,
    first_name text,
    primary key(id));

insert into employees values 
    (0, 'Sehn', 'Tim'), 
    (1, 'Hendriks', 'Brian'), 
    (2, 'Son','Aaron'), 
    (3, 'Fitzgerald', 'Brian');

select * from employees where first_name='Brian';
+------+------------+------------+
| id   | last_name  | first_name |
+------+------------+------------+
|    1 | Hendriks   | Brian      |
|    3 | Fitzgerald | Brian      |
+------+------------+------------+

select * from dolt_diff_employees;
+--------------+---------------+-------+-----------+----------------+----------------+-----------------+---------+----------------------------------+-------------------------+-----------+
| to_last_name | to_first_name | to_id | to_commit | to_commit_date | from_last_name | from_first_name | from_id | from_commit                      | from_commit_date        | diff_type |
+--------------+---------------+-------+-----------+----------------+----------------+-----------------+---------+----------------------------------+-------------------------+-----------+
| Sehn         | Tim           |     0 | WORKING   | NULL           | NULL           | NULL            |    NULL | ne182jemgrlm8jnjmoubfqsstlfi1s98 | 2022-06-07 16:35:49.277 | added     |
+--------------+---------------+-------+-----------+----------------+----------------+-----------------+---------+----------------------------------+-------------------------+-----------+

call dolt_checkout('-b','modifications');
update employees SET first_name='Timothy' where first_name='Tim';
call dolt_commit('-am', 'Modifications on a branch');

select * from dolt_diff('main', 'modifications', 'employees');
+--------------+---------------+-------+---------------+-------------------------+----------------+-----------------+---------+-------------+-------------------------+-----------+
| to_last_name | to_first_name | to_id | to_commit     | to_commit_date          | from_last_name | from_first_name | from_id | from_commit | from_commit_date        | diff_type |
+--------------+---------------+-------+---------------+-------------------------+----------------+-----------------+---------+-------------+-------------------------+-----------+
| Sehn         | Timothy       |     0 | modifications | 2022-06-07 16:41:49.847 | Sehn           | Tim             |       0 | main        | 2022-06-07 16:39:32.066 | modified  |
| Wilkins      | Daylon        |     4 | modifications | 2022-06-07 16:41:49.847 | NULL           | NULL            |    NULL | main        | 2022-06-07 16:39:32.066 | added     |
+--------------+---------------+-------+---------------+-------------------------+----------------+-----------------+---------+-------------+-------------------------+-----------+

Graphql-join this data with the employees table, I'll get this Graph:

query test {
    employees {  // array of objects
        id
        last_name
        first_name
        dolt_diff_employees_modifications { // array of objects
            to_last_name
            to_first_name
            to_id
            to_commit
            to_commit_date
            from_last_name
            from_first_name
            from_id
            from_commit
            from_commit_date
            diff_type
        }
        dolt_diff_employees_other_modifications { // array of objects
            to_last_name
            to_first_name
            to_id
            to_commit
            to_commit_date
            from_last_name
            from_first_name
            from_id
            from_commit
            from_commit_date
            diff_type
        }
    }
}
zachmu commented 9 months ago

Hi @deathemperor,

Can you clarify what you're looking for here? Do you want DoltgreSQL to consume replication events from a Postgres server, or something else?

deathemperor commented 9 months ago

@zachmu if I understand correctly, the replication feature running on MySQL in Dolt right now enables Dolt to capture changes while the main database instance functions as is. Which is the use cases of already-production businesses. Please correct me if I'm wrong.

I want DoltgreSQL to act the same, a replication instance to capture changes. I'm not sure I understand what replication events means to be honest.

Let's try explaining it using my example:

zachmu commented 9 months ago

Sorry for the late reply here.

We're planning on building the ability for doltgres to act as a replica of a postgres database, but the doltgres replica won't be accessible from the postgres primary. So:

I join tables from 2 databases to build a related data that can be queried: with each employee, what changes were made to first and last name in the last 6 months.

I don't think this will work -- the postgres instance won't be able to see the replica to query it and do the join. But you'll be able to do the historical query on the doltgres replica itself, without involving the postgres primary.

deathemperor commented 8 months ago

I don't think this will work -- the postgres instance won't be able to see the replica to query it and do the join. But you'll be able to do the historical query on the doltgres replica itself, without involving the postgres primary.

That's totally understandable. I tried to explain it in the basic terms possible (RDBMS) in my example. However, our real use case involve Hasura and GraphQL which enables data relation between any data sources so in this case I'm 99% sure it will work as Doltgres is Postgres compatible. Sure there may be this and that issue but I believe the basics are already awesome.

deathemperor commented 5 days ago

@zachmu @fulghum any chance I can get an update on this?

zachmu commented 4 days ago

Hi @deathemperor,

Sorry for the lack of progress on this, that's my fault. We've been heads down making bug fixes and enhancements to the core doltgres server experience and haven't been paying much attention to the replication feature.

I'm going to get you an updated release with additional logging today so we can try to figure out where the error you're seeing is coming from. I'll update this thread when it's out.

zachmu commented 3 days ago

Please try again with the latest release:

https://github.com/dolthub/doltgresql/releases/tag/v0.12.0

It has more lenient error handling for sending replication standby messages (the error you were seeing in the other bug), as well as a lot of other bug fixes for the server in general.

Let me know how it works.