Open gumaerc opened 2 years ago
Hi @gumaerc , can you provide me the complete query that makes failing Replibyte? Then I can reproduce it and provide a fix on the parser. Thank you 🙏🏽
@evoxmusic I'll have to do some setup to reproduce this as we've pivoted to simply using pg_dump
/ pg_restore
for the time being. We run jobs like this in an instance of Concourse, which is a CI/CD runner like Github actions that runs everything in Docker containers that it throws away after the build so I don't have any saved logs.
I'll make another issue for this potentially, but another reason we ultimately decided to abandon using Replibyte is that we ran into a severe memory leak issue. I was testing all of this using a locally running instance of Concourse. I did some manual adjustment of the data to remove the escape characters that were tripping up Replibyte, and when I ran the dump again it would get to the second chunk of 100MB and then RAM usage would skyrocket and consume all 32GB of RAM in under 1 minute. I'm not sure how best to provide debug output for that scenario. I can maybe look into getting a copy of our database dump with PII stripped out for debugging. We're still interested in possibly using Replibyte if we can as we're interested in being able to hook in and subset the data for local development, but these issues got in the way unfortunately.
This happened to me as well, so I had to do a little search in the DB to find bad data but it would be awesome if we could fix this to have the parser working correctly.
I can provide some test inputs (Might be related to this problem which involves single quotes in the database and backslashes):
SQL Dump (This is a valid output from pg_dump):
CREATE TABLE public."Attachments" (
id uuid NOT NULL,
text character varying(255) NOT NULL,
);
INSERT INTO public."Attachments" (id, text) VALUES ('8e6c5d2b-0b93-4152-a1d2-2f339ae16aab', 'this should not break');
INSERT INTO public."Attachments" (id, text) VALUES ('8e6c5d2b-0b93-4152-a1d2-2f339ae16aac', 'this should break if we escape a quote like this child\''s');
Replibyte config file:
source:
transformers:
- database: public
table: Attachments
columns:
- name: url
transformer_name: random
datastore:
local_disk:
dir: ./storage
Then if I run:
cat test.sql | replibyte -c test-replibyte.yaml dump create -i -s postgresql
I get the output:
Dump created successfully!
But the sanitized file doesn't have the last row:
replibyte -c test-replibyte.yaml dump restore local -i postgres -v latest -o > sanitized-dump.sql
So, I think the parser is not working properly when we have single quotes escaped with backslashes: \'
. When I removed the backslash from the database everything started working correctly.
Input where I can reproduce the error from this issue (Also came from pg_dump
but I have changed some values):
CREATE TABLE public."Attachments" (
id uuid NOT NULL,
text character varying(255) NOT NULL,
);
INSERT INTO public."Attachments" (id, text) VALUES (1, 'this should not break');
INSERT INTO public."Attachments" (id, text) VALUES (2, 'Chris and his beautiful bride Laura.\');
INSERT INTO public."Attachments" (id, text) VALUES (3, 'Whatever! :)');
INSERT INTO public."Attachments" (id, text) VALUES (4, 'Wow what a game ;)');
cat test.sql | env RUST_BACKTRACE=1 replibyte -c test-replibyte.yaml dump create -i -s postgresql
failing query: 'INSERT INTO public."Attachments" (id, text) VALUES (3552, 'Chris and his beautiful bride Laura.\');
INSERT INTO public."Attachments" (id, text) VALUES (3588, 'Whatever! :)');
INSERT INTO public."Attachments" (id, text) VALUES (3598, 'Wow what a game ;'
thread 'main' panicked at 'TokenizerError { message: "Unterminated string literal", line: 5, col: 22 }', dump-parser/src/postgres/mod.rs:790:13
stack backtrace:
⠁
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: dump_parser::postgres::get_tokens_from_query_str
3: replibyte::source::postgres::read_and_transform::{{closure}}
4: dump_parser::utils::list_sql_queries_from_dump_reader
5: replibyte::source::postgres::read_and_transform
6: <replibyte::source::postgres_stdin::PostgresStdin as replibyte::source::Source>::read
7: <replibyte::tasks::full_dump::FullDumpTask<S> as replibyte::tasks::Task>::run
8: replibyte::commands::dump::run
9: replibyte::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
If I remove the \
or the ;
or the :)
from the input it works correctly ...
Hey guys looks like replibyte has issue with escaping apostrophe, I've made fix for that, maybe it fixes also your problems.
Hi there. I'm attempting to use Replibyte to set up replication of our production database to our test environment. The database in question is attached to a custom built CMS web application and the bulk of the data are text fields that store markdown. Running a replibyte dump against this database produces the following error output:
This data is a markdown code block with a Java example in it. The offending line contains
callout("printf", "Indeed! \'tis 15!\n");
. When I remove the backslash before the single quote and save the data, the Replibyte dump completes successfully. This is valid text data for a text field, so I'm not sure why this would be failing. Any ideas?