citusdata / pg_cron

Run periodic jobs in PostgreSQL
PostgreSQL License
2.91k stars 195 forks source link

Hard pg_cron 1.5+ dependency on pg_catalog breaks restores of dumps created when using older extension versions #274

Open pcnc opened 1 year ago

pcnc commented 1 year ago

Summary

We've noticed that restoring Postgres dumps created when using older versions of pg_cron (namely 1.4.2 in our case) against a database where pg_cron 1.5+ is installed break due to a newly introduced hard, and undocumented, requirement on creating the extension in the pg_catalog schema.

Description

pg_dump generates the extension-related section of its database dumps in the following format, explicitly stating the schema where the extension was enabled at dump-time:

--
-- Name: pg_cron; Type: EXTENSION; Schema: -; Owner: -
--

CREATE EXTENSION IF NOT EXISTS pg_cron WITH SCHEMA public;

This dump was created off a Postgres install running pg_cron 1.4.2, with the extension enabled in the public schema by running CREATE EXTENSION pg_cron;. As the user's search_path was set to the public schema, the extension defaulted to it during creation.

When restoring the above dump against a database service running pg_cron 1.5+, the following error is encountered:

psql:db_cluster-01-06-2023@06-34-36.backup:203: ERROR:  extension "pg_cron" must be installed in schema "pg_catalog"

This, in turn has the following effect:

Exagone313 commented 1 year ago

There is a workaround with the options -l (lower case L) and -L of pg_restore:

cocowalla commented 1 year ago

Just got hit by this too 😞 1.4.2 worked fine, so I'm going to stick with it.

pcnc commented 1 year ago

@Exagone313 - thanks for the workaround! This'll surely help out others which are impacted by the same issue.

Unfortunately when dealing with very large backups, or backups created using pg_backupall, the approach is either computationally intensive, or non viable, which renders us unable to upgrade to pg_cron 1.5+ across the platform.

@marcocitus - sorry to ping you like this, but is there any other workaround which could be implemented, if the hard requirement to create pg_cron in the pg_catalog schema is unlikely to be changed?

mattaylor commented 1 year ago

Are there any updates to this issue?

hanefi commented 1 year ago

@pcnc do you also run into #113 ? This is a limitation in pg_dump&pg_restore that already breaks pg_cron due to the policies that the extension create.

After some discussion on #289, it feels safer to me if we keep pg_catalog as the default schema in the control file.

philip-harvey commented 1 year ago

I'm confused as to why https://github.com/citusdata/pg_cron/pull/289 was closed without being merged. It's a breaking and undocumented change. Can anyone shed any light on this?

JelteF commented 1 year ago

Slightly changing the output from pg_dump should allow you to work arround this issue. With something like this you should be able to automate that (untested command).

pg_dump | sed 's/^CREATE EXTENSION IF NOT EXISTS pg_cron WITH SCHEMA public;$/CREATE EXTENSION IF NOT EXISTS pg_cron;'

But then you'll likely still hit: https://github.com/citusdata/pg_cron/issues/113

philip-harvey commented 1 year ago

This is a breaking change for a bunch of reasons, not just for restores.

JelteF commented 11 months ago

@philip-harvey can you explain in what other ways it is a breaking change for you?

philip-harvey commented 11 months ago

@philip-harvey can you explain in what other ways it is a breaking change for you?

At a minimum, all code that specifies or specified a schema for pg_cron is now broken. In our case it's Terraform code. Also we have a ton of databases, and in all existing databases pg_cron is installed into a specific schema. We can now no longer create more instances that match the configuration, so we can't perform any testing now.

idugalic commented 11 months ago

@Exagone313 - thanks for the workaround! This'll surely help out others which are impacted by the same issue.

Unfortunately when dealing with very large backups, or backups created using pg_backupall, the approach is either computationally intensive, or non viable, which renders us unable to upgrade to pg_cron 1.5+ across the platform.

@marcocitus - sorry to ping you like this, but is there any other workaround which could be implemented, if the hard requirement to create pg_cron in the pg_catalog schema is unlikely to be changed?

What is the plan for this issue, please?

Is there any other workaround that could be implemented, if the hard requirement to create pg_cron in the pg_catalog schema is unlikely to be changed?

I am using the Supabase and sub-minute cron jobs are needed. Unfortunately, Supabase can not use pg_cron 1.5+ :(

marcoslot commented 11 months ago

At a minimum, all code that specifies or specified a schema for pg_cron is now broken. In our case it's Terraform code.

I think that's fair.

Perhaps we should use ProcessUtility to fix CREATE EXTENSION statements that use the old schema?

idugalic commented 8 months ago

Are there any updates to this issue?

keithf4 commented 7 months ago

Also ran into this forced pg_catalog schema issue in the pgq extension as well. This seems a very bad practice to start implementing in extensions. Looking at the sql the objects are still being created in a cron schema anyway, so this is very confusing when one does a \dx or catalog lookup because PG thinks the extension is in pg_catalog but all the objects are still in cron.

https://github.com/citusdata/pg_cron/blob/main/pg_cron.sql#L15

If you remove the code to forcefully make a schema and instead just have the schema label in the control file set to cron then it creates the desired schema by default as long as the SCHEMA flag is not set. It doesn't allow the user to specify their own schema then, but if that's the desired result, then that's fine.

https://github.com/pgq/pgq/issues/22#issuecomment-2045868135

keithf4 commented 7 months ago

Saw the other open issue talking about allowing custom schemas and the reason for not doing this is security concerns.

https://github.com/citusdata/pg_cron/issues/225

If the pg_cron code is done properly (fully schema qualify all object reference calls), all security concerns around custom schemas can be avoided. I know this because I did it in pg_partman, even correcting a very serious CVE in the past about it.

Pretty sure any user code touching pg_catalog like this would be looked down upon by core.

keithf4 commented 7 months ago

Just to show with current version, because all object creation is schema qualified to cron, the extension thinks it's installed to pg_catalog but it's all forced into cron

keith=# \dx
                                      List of installed extensions
    Name    | Version |    Schema     |                           Description                           
------------+---------+---------------+-----------------------------------------------------------------
 pg_cron    | 1.6     | pg_catalog    | Job scheduler for PostgreSQL
 pg_partman | 5.1.0   | partman       | Extension to manage partitioned tables by time or ID
 pgtap      | 1.2.0   | public        | Unit testing for PostgreSQL
 plpgsql    | 1.0     | pg_catalog    | PL/pgSQL procedural language
(5 rows)

keith=# \dx+ pg_cron
                    Objects in extension "pg_cron"
                          Object description                          
----------------------------------------------------------------------
 function cron.alter_job(bigint,text,text,text,text,boolean)
 function cron.job_cache_invalidate()
 function cron.schedule_in_database(text,text,text,text,text,boolean)
 function cron.schedule(text,text)
 function cron.schedule(text,text,text)
 function cron.unschedule(bigint)
 function cron.unschedule(text)
 schema cron
 sequence cron.jobid_seq
 sequence cron.runid_seq
 table cron.job
 table cron.job_run_details
polobo commented 7 months ago

FWIW I filed a bug report against PostgreSQL to fix alter extension set schema to allow the DBA to update their installed extension with the pg_catalog schema reference once v1.5+ is installed on said machine. Right now the original user-specified schema is retained and so even after upgrading pg_dump outputs now invalid and incompatible data. The workarounds are to drop extension and create it again or to manually alter the relevant catalog entry as superuser under catalog update mode.

https://www.postgresql.org/message-id/CAKFQuwa1cb9y6OW9rSmXFGH69J3ZEQOm3+Od4Ang4qi74YHcTw@mail.gmail.com