Adopt the same proces as https://greenmask.io ?

jensenbox commented 4 months ago

It seems that while the backup file still contains the unsanitzed data, their process is significantly faster.

Any chance of adopting their methodology instead of the change the data while in flight? Theirs is to mutate the data once it lands in the destination database.

evoxmusic commented 1 month ago

Hi @jensenbox that looks quite interesting. I think we can fix the performance issues by working on the lexer parser to have low memory footprints. I've got some hints, but it's a matter of time. Did you try GreenMask? Are the performances much faster?

vchervanev commented 2 weeks ago

@evoxmusic As I understand their solution completely excludes SQL parsing bc their data payloads are coming from the Postgres COPY command, meaning for a transformation it only needs to split the input string and the input value is ready to be deserialized and transformed.

Also they use a 3-step approach

pg_dump schema-only --section pre-data & restore -- create empty tables with no indexes, triggers, etc
custom COPY-based export & restore -- arguably that's the fastest possible way to restore. Low parsing overhead, lowest possible insert overhead.
pg_dump --section post-data & restore -- finalize import by restoring indexes, constraints, foreign-keys(?), etc

Qovery / Replibyte

Adopt the same proces as https://greenmask.io ? #302