ankane / pgsync

Sync data from one Postgres database to another
MIT License
3.19k stars 201 forks source link

Ideas #93

Open ankane opened 4 years ago

ankane commented 4 years ago

Ideas

probablykabari commented 4 years ago

Is there already an ability to limit rows without using a group? If not that would be good (and I'd attempt a PR).

ankane commented 4 years ago

Hey @RipTheJacker, you should be able to do:

pgsync table "limit 1000"
probablykabari commented 4 years ago

@ankane I was thinking as part of the config, in essence the same as group config but a level higher. But maybe this way is a better practice. It's unclear in the docs, but does the groups config observe the exclude config? I see it works across all tables when using pgsync * "limit N" which is aright.

caioariede commented 3 years ago

@ankane I'm wondering if it's possible to mix the usage of --in-batches with --defer-constraints-v2. I have two different large tables that refer to each other and I'm not sure how to address that. It seems --in-batches is restricted to a single table but in order to make the scenario I have work, I'd need the two tables to be synced within the same session. Please let me know if you have any thoughts/suggestions. I'd love to contribute.

ankane commented 3 years ago

Hey @caioariede, I'm not sure how you could combine the two, but feel free to fork and experiment.

geoffharcourt commented 2 years ago

Is it possible to set defer_constraints in the YAML? We effectively have to use it all the time, and being able to make it set by default in our config would help us clean up some scripts and CLI use.

alextakitani commented 1 year ago

I needed to replace names, emails, and addresses.

I was experimenting with Faker, do you think it's a good idea?

caioariede commented 1 year ago

@alextakitani one thing to keep in mind with libraries like Faker is that they can get pretty slow if you're relying on random data. With small datasets that can be okay though.

alextakitani commented 1 year ago

Indeed @caioariede , that was my experience.

I'm using Faker inside Postgres, https://gitlab.com/dalibo/postgresql_faker

But not everyone can install extensions in production ( that's my case )

So I ended up downloading a backup, restoring locally and then generating a sync with fake data.

Natgho commented 1 year ago

What if there is an option to "skip" the wrong line? For example, if there was a chance to skip the line with the error "y(insert or update on table "xxxx" violates foreign key constraint "xxxx")" and continue synchronization? @ankane

geoffharcourt commented 1 year ago

@Natgho once that error happens the wrapping transaction is polluted and can't continue, so I think that would be hard to do

onwardmk commented 4 months ago

I noticed that the tables are copied sequentially. If the --defer-constraints was enabled, could it be possible to run multiple copy commands in parallel to speed up the sync?