Closed tendres closed 6 years ago
Well redshift uses the PostgreSQL protocol, so just use PostgreSQL://
for the connection string. pgloader auto-detects when it's talking to a redshift server rather than a PostgreSQL server. Also, you need to use the current git master version of pgloader (build from fresh sources) to have working Redshift support.
Ah good - thanks. Built from fresh sources.
pgloader version "3.5.f8460c1"
compiled with SBCL 1.3.1.debian
Now getting the following error from Redshift:
ERROR Database error 42883: function string_agg(name, "unknown") does not exist
HINT: No function matches the given name and argument types. You may need to add explicit type casts.
QUERY: -- params: including
-- filter-list-to-where-clause for including
-- excluding
-- filter-list-to-where-clause for excluding
select n.nspname,
i.relname,
i.oid,
rn.nspname,
r.relname,
indisprimary,
indisunique,
(select string_agg(attname, ',')
from pg_attribute
where attrelid = r.oid
and array[attnum::integer] <@ indkey::integer[]
) as cols,
pg_get_indexdef(indexrelid),
c.conname,
pg_get_constraintdef(c.oid)
from pg_index x
join pg_class i ON i.oid = x.indexrelid
join pg_class r ON r.oid = x.indrelid
join pg_namespace n ON n.oid = i.relnamespace
join pg_namespace rn ON rn.oid = r.relnamespace
left join pg_depend d on d.classid = 'pg_class'::regclass
and d.objid = i.oid
and d.refclassid = 'pg_constraint'::regclass
and d.deptype = 'i'
left join pg_constraint c ON c.oid = d.refobjid
where n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
order by n.nspname, r.relname
When using:
LOAD DATABASE
FROM postgresql://USER:PASSWORD@ENDPOINT:PORT/DB_NAME
INTO postgresql://USER:PASSWORD@ENDPOINT:PORT/DB_NAME
WITH on error stop,
concurrency = 1,
workers = 8,
prefetch rows = 20000
SET PostgreSQL PARAMETERS
search_path to 'some_schema'
INCLUDING ONLY TABLE NAMES MATCHING 'some_table' IN SCHEMA 'some_schema';
Maybe LISTAGG instead of string_agg for Redshift? LISTAGG Function
Forgot to mention the Redshift instance is still up with your same credentials if you need a place to test.
Thanks for the notice about the testing environment, then I realised I could probably easily fix the problem at hand... and I think I did. Please try again from a fresh build.
@dimitri So close! Now an error about the function identity
on the postgres side:
2018-11-08T16:02:19.468000Z ERROR Database error 42883: function identity(integer, integer, text) does not exist
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
QUERY: CREATE TABLE "staging"."work"
(
"id" integer default "identity"(137710, 0, '1,1'::text),
"source_id" integer default NULL,
"data_source_id" integer default NULL,
"name" text default NULL,
"created_at" timestamp without time zone default CURRENT_TIMESTAMP
);
2018-11-08T16:02:19.471000Z FATAL Failed to create the schema, see above.
2018-11-08T16:02:19.475000Z LOG report summary reset
table name errors rows bytes total time
----------------- --------- --------- --------- --------------
fetch meta data 0 3 0.156s
Create Schemas 0 0 0.002s
Create SQL Types 0 0 0.009s
Create tables 0 0 0.000s
----------------- --------- --------- --------- --------------
----------------- --------- --------- --------- --------------
The DDL from Redshift and RDS look identical other than auto-incremental stuff. Redshift:
CREATE TABLE "staging"."work" (
"id" int4 NOT NULL DEFAULT "identity"(137710, 0, '1,1'::text),...
RDS:
CREATE TABLE "staging"."work" (
"id" int4 NOT NULL DEFAULT nextval('"staging".work_id_seq'::regclass),...
This table staging.work
isn't part of the test case you gave me access to. Can you add it, so that I can reproduce and fix the error? Thanks.
I created a table staging.work
containing 1000 records and a view to look at the DDL of any existing tables.
By running:
SELECT
ddl
FROM
admin.v_generate_tbl_ddl
WHERE
schemaname = 'staging'
AND tablename = 'work';
returns
CREATE TABLE IF NOT EXISTS staging."work"
(
id INTEGER NOT NULL DEFAULT "identity"(347358, 0, ('1,1'::character varying)::text) ENCODE lzo
,source_id INTEGER NOT NULL ENCODE lzo
,data_source_id INTEGER NOT NULL ENCODE lzo
,name VARCHAR(4000) NOT NULL ENCODE lzo
,created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT getdate() ENCODE lzo
,PRIMARY KEY (id)
,UNIQUE (source_id, data_source_id)
)
DISTSTYLE EVEN
;
After some experimentation I found that the '1,1'
in the identity
line refers to the start value (seed) and step value respectfully. I think 347358
is the table's oid - but not sure about ,0,
.
I do not see a reference to a next value
for a sequence.
Hope this helps and thank you again Dimitri.
Now that I have a table using an identity default value from Redshift, I could fix the casting process to switch to using serial/bigserial instead. Note that I didn't implement parsing the seed and step values at this point. It's all considered the same as a MySQL _autoincrement “extra” hint. Hope that works for you, otherwise, as usual, please comment here.
This. Is. Amazing. Thank you Dimitri!
I'm creating a copy script that copies tables between clusters via s3. However, I have a fear, that the table_id would cause issues, so I decided to created a regex to match it. Matching group 4 contains the starting value, and the incerement, you can pytgon re.groups(4) to get the values.
In my usecase, if match found it will remove/replace it with this, else it will leave the original value _id BIGINT NOT NULL DEFAULT defaultidentity(1004356, 0, '100,1'::text) ENCODE az64 -> id BIGINT NOT NULL identity(100, 1)
here's my regex:
(default)?(\"|\_)identity[\"]?\([0-9]{3,}\,\s[0-9]+\,\s(\()?(\'[0-9]+\,[0-9]+\')?\:\:.+\)
I am looking to migrate some tables from Redshift to Postgres and have been unable to get the syntax correct - is there any documentation I am missing? When I use
redshift
pgloader appears to believe it is an sqlite connection? Looking for a robust example to follow - any help appreciated![ ] pgloader --version
[ ] did you test a fresh compile from the source tree? Yes
[ ] did you search for other similar issues? Yes
[ ] how can I reproduce the bug? I am running pgloader on an EC2 instance attempting to move tables from Amazon Redshift to Postgres RDS instances using the below.