cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

sql: support pg_dump #20296

Open donpdonp opened 6 years ago

donpdonp commented 6 years ago

FEATURE REQUEST

One big plus for trying cockroachdb was its compatibilty with postgres. I assumed pg_dump would work, instead it errors out.

$ pg_dump -h db1 -p 26257 database
pg_dump: [archiver (db)] query failed: ERROR:  unknown function: pg_catalog.pg_is_in_recovery()
pg_dump: [archiver (db)] query was: SELECT pg_catalog.pg_is_in_recovery()

I've also tried 'cockroachdb dump > out.sql' but the SQL is not postgres compatible. id STRING(128) for example. I hand edited the schema to try and get this route to work, but loading the data itself hits schema constraints since a foreign key field means table B must come before table A, but the sql dump file loads table A then B. pg_restore is supposed to have a work-around for this, which is why I'd like to have pg_dump work.

I belive pg_is_in_recovery is relatively new for postgres, so perhaps this only recently stopped working.

Jira issue: CRDB-5936

jordanlewis commented 6 years ago

Looks like pg_is_in_recovery has been around since the beginning of the 9.0 series at least, so it's likely that pg_dump has never worked - I don't think we've ever tried it.

I don't think supporting pg_dump is a particularly high priority for us - better would likely be to improve cockroach dump so it orders its output based on a topological sorting of the tables with respect to their constraints. cc @mjibson any thoughts on that?

That being said if it's just a matter of implementing a couple of extra builtins it's no big deal. I have a feeling that pg_dump will use some advanced Postgres features that we don't have much hope of supporting, though.

jseldess commented 6 years ago

cc @dianasaur323, @awoods187

dianasaur323 commented 6 years ago

Hi @donpdonp, thanks for the feature request! I'm in agreement with @jordanlewis that we should focus more on improving cockroach dump to output PostgreSQL compatible formats (@mjibson thoughts?) One possibility is to disable constraint checking in PostgreSQL before loading the data in, but it looks like they don't have a very user friendly configuration toggle for that.

If we were able to do a better job of pushing out PostgreSQL compatible dumps, would that be an acceptable solution for you? Also, do you mind me asking why you want to go from CRDB to PostgreSQL? Is this for testing purposes, or will it be a permanent set up in your environment?

jordanlewis commented 6 years ago

For future reference, here's pg_dump's source code: https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L1061

Looks like it uses quite a few advanced features that we don't support yet.

And that's only the setting initialization part. Seems likely that during the actual dump phase this tool will rely on more arcane features that we don't support yet.

maddyblue commented 6 years ago

What version of the cockroach client are you using when you run cockroach dump? As of 1.1 it should output tables in an order that supports foreign keys (except for self references, which may or may not work).

I think it makes sense to attempt to support more interop between cockroach dump and pg_dump (in both directions), but it's unclear what form that will take. It could mean supporting more SQL syntax or writing/finding a program that can do the translation.

donpdonp commented 6 years ago

@mjibson ah yes I'm on 1.0.6. I managed to work around the load ordering by dump/restore one table at a time in the right order.

using psql compliant SQL would be a big help for the dump output. STRING(x) to VARCHAR(x) and FAMILY(..) were the hangups I ran into.

@dianasaur323 i moved away from crdb because it'd consume more than 800mb of ram under low load for a 25mb dataset.

dianasaur323 commented 6 years ago

@donpdonp thanks for the feedback! We are sad to see you go, although that RAM usage seems a bit crazy. If you have time and wouldn't mind sharing, what did you RAM usage / workload look like? Even if we aren't a fit now, we should see if we can improve that behavior.

jordanlewis commented 5 years ago

This doesn't work because of missing support for implicit column tableoid:

pg_dump: [archiver (db)] query failed: ERROR:  column "x.tableoid" does not exist
pg_dump: [archiver (db)] query was: SELECT x.tableoid, x.oid, x.extname, n.nspname, x.extrelocatable, x.extversion, x.extconfig, x.extcondition FROM pg_extension x JOIN pg_namespace n ON n.oid = x.extnamespace
paulcarlucci commented 5 years ago

Perl DBI's table_info also requires tableoid.

fmax commented 4 years ago

i also get those errors, when trying to use cockroachdb as pgsql backend for the monitoring application cachet:

Initializing Cachet container ... 
Attempting to connect to database ... 

ERROR:  unknown function: to_regclass() 
Table sessions does not exist! ... 
Initializing Cachet database ... 
Clearing settings cache... 
Settings cache cleared! 
Backing up database... 
Dumping database and uploading... 
pg_dump: [archiver (db)] query failed: ERROR:  column "x.tableoid" does not exist 
pg_dump: [archiver (db)] query was: SELECT x.tableoid, x.oid, x.extname, n.nspname, x.extrelocatable, x.extversion, x.extconfig, x.extcondition FROM pg_extension x JOIN pg_namespace n ON n.oid = x.extnamespace 

first i had done an pg_dump whole cachet pgsql database. than i removed AS integer from the CREATE SEQUENCE commands to get the dump successfully imported. but when starting the application, the errors above occurs.

dcecile commented 3 years ago

The Dbmate migration tool uses pg_dump to export the schema to disk:

https://github.com/amacneil/dbmate/blob/main/pkg/driver/postgres/postgres.go#L176

It also fails with the x.tableoid error:

Error: pg_dump: error: query failed: ERROR:  column "x.tableoid" does not exist
pg_dump: error: query was: SELECT x.tableoid, x.oid, x.extname, n.nspname, x.extrelocatable, x.extversion, x.extconfig, x.extcondition FROM pg_extension x JOIN pg_namespace n ON n.oid = x.extnamespace
idc77 commented 2 years ago

ent, a Go ORM in the broadest sense apparently uses to_regclass() somewhere in its generated code.

When trying to create the database schema with pgx

// setupCmd represents the setup command
var setupCmd = &cobra.Command{
    Use:   "setup",
    Short: "A brief description of your command",
    RunE: func(cmd *cobra.Command, args []string) error {
        fmt.Println("setup called")
        // db, e := sql.Open("pgx", config.DatabaseURIPostgres)
        db, e := sql.Open("pgx", config.DatabaseURICockroach) // "postgresql://root@127.0.0.1:26257/blogs"
        if e != nil {
            return e
        }
        drv := entsql.OpenDB(dialect.Postgres, db)
        client := ent.NewClient(ent.Driver(drv))
        defer client.Close()
        ctx := context.Background()
        if e := client.Schema.Create(ctx, schema.WithAtlas(true)); e != nil {
            return e
        }
        return nil
    },
}

I get

Error: sql/schema: ERROR: unknown function: to_regclass() (SQLSTATE 42883)

With the pq driver

        client, e := ent.Open(dialect.Postgres, config.DatabaseURIProd)
        if e != nil {
            return e
        }
        defer client.Close()
        ctx := context.Background()
        if e := client.Schema.Create(ctx, schema.WithAtlas(true)); e != nil {
            return e
        }
        return nil
Error: sql/schema: pq: unknown function: to_regclass()

pinging @ent for reference

rafiss commented 2 years ago

Thanks @idc77 ! I filed https://github.com/cockroachdb/cockroach/issues/77838 to track that specifically, so this issue stays focused on pg_dump.

otan commented 2 years ago

I took a look at pg_dump for pg version v14.2 during breather week, working branch: https://github.com/cockroachdb/cockroach/pull/77879

Here's what's missing:

Currently stuck on a decorrelated subquery error:

SELECT oid, tableoid, pol.polrelid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN pol.polroles = '{0}' THEN NULL ELSE    pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(rolname) from pg_catalog.pg_roles WHERE oid = ANY(pol.polroles)), ', ') END AS polroles, pg_catalog.pg_get_expr(pol.polqual, pol.polrelid) AS polqual, pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid) AS polwithcheck FROM pg_catalog.pg_policy pol;

ERROR: could not decorrelate subquery
otan commented 2 years ago

looks like in PG15, table oids are statically assigned (src/backend/catalog/system_fk_info.h), and pg_dump relies on this hardcoding, no chance of making this work anymore :(

obeleh commented 1 year ago

Does that mean there won't be any support coming for < PG15 either?