credativ / informix_fdw

Foreign Data Wrapper for Informix Databases
Other
28 stars 5 forks source link

Postgres server crash due to error in ifx_fdw.c #31

Closed herwigg closed 10 months ago

herwigg commented 3 years ago

Hi, We have encountered a PG server crash due to an error in ifx_fdw.c The problem occurs when deleting from a a table that has a before trigger on it, that trigger deletes from a (informix) foreign table as well.

CREATE TABLE metadblucht.c_eenheid (
    enh_id serial NOT NULL,
    dat_wijziging timestamp NOT NULL,
    gebr_wijziging varchar(255) NULL,
    CONSTRAINT pk_enh PRIMARY KEY (enh_id)
);
Trigger and function:
create trigger td_c_eenheid before
delete
    on
    metadblucht.c_eenheid for each row execute function pd_c_eenheid();
create trigger ti_c_eenheid before
insert
    on
    metadblucht.c_eenheid for each row execute function pi_c_eenheid();
create trigger tu_c_eenheid before
update
    on
    metadblucht.c_eenheid for each row execute function pu_c_eenheid();

   CREATE OR REPLACE FUNCTION metadblucht.pd_c_eenheid()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
            BEGIN
                delete from informix.t_water_eenheid
                where enh_id = new.enh_id;
            END;
        $function$
;
foreign table: 
CREATE FOREIGN TABLE informix.t_water_eenheid (
    enh_id int4 NOT NULL
)
SERVER o_oltp_app_water
OPTIONS (table 'c_eenheid_metadblucht', database 'water', client_locale
'en_US.utf8', db_locale 'nl_be.utf8');

The trigger should be an after trigger, but I do not think that is to the point for this issue. The statement causing the server crash: start transaction; delete from metadblucht.c_eenheid where enh_id = 40; Querying the foreign table works fine and directly deleting from the foreign table as well. This is the relevant section in the PG log file:

__2021-04-21 11:54:30 CEST LOG:  server process (PID 3933) was terminated by
signal 11: Segmentation fault
__2021-04-21 11:54:30 CEST DETAIL:  Failed process was running: delete from
metadblucht.c_eenheid where enh_id = 40
__2021-04-21 11:54:30 CEST LOG:  terminating any other active server
processes
gebouwenbeheerdba_gebouwenbeheer_2021-04-21 11:54:30 CEST WARNING: 
terminating connection because of crash of another server process
gebouwenbeheerdba_gebouwenbeheer_2021-04-21 11:54:30 CEST DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
gebouwenbeheerdba_gebouwenbeheer_2021-04-21 11:54:30 CEST HINT:  In a moment
you should be able to reconnect to the database and repeat your command.
oari_ari_2021-04-21 11:54:30 CEST WARNING:  terminating connection because
of crash of another server process
oari_ari_2021-04-21 11:54:30 CEST DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
oari_ari_2021-04-21 11:54:30 CEST HINT:  In a moment you should be able to
reconnect to the database and repeat your command.

Can you have a look ? Thanks. Herwig

psoo commented 2 years ago

Sorry for the really long delay, $reasons prevented me to follow here closely. I'll look into this.

psoo commented 1 year ago

Hi @herwigg ,

Sry for being so late back to the party, $reasons had it again that i couldn't find time to put my hands on these things here.

You are referencing NEW.enhid, which causes the crash. This is wrong in DELETE events, you should refer to OLD.enhid instead. Also you must be very careful in returning the correct row, in your case you miss to RETURN OLD;. For insert this is NEW, updates can either return OLD or NEW (triggers on INSERT...ON CONFLICT have somehow a special behavior), see

https://www.postgresql.org/docs/current/trigger-definition.html

for more details, please. If you specify the trigger correctly, it works.

Though, i haven't figured out yet why this causes the foreign data wrapper to struggle so bad in this specific case.

psoo commented 1 year ago

Looks like this is related to how statements in plpgsql are implicitly planned and so related to prepared statements. Note this behavior when executing a explicit prepared statement repeatedly:

SET client_min_messages TO ERROR;

insert into informix.inttest select t.id FROM generate_series(1, 10) AS t(id);

PREPARE del_inttest(bigint) AS DELETE FROM informix.inttest WHERE f1 = $1;
EXECUTE del_inttest (1);

EXECUTE del_inttest (2);

EXECUTE del_inttest (3);

EXECUTE del_inttest (4);

EXECUTE del_inttest (5);

EXECUTE del_inttest (6);

EXECUTE del_inttest (7);
FEHLER:  informix FDW error: "Not in transaction. "
DETAIL:  SQLSTATE IX000 (SQLCODE=1)

The foreign data wrapper enters an invalid state, which the error message indicates. But setting the plan_cache_mode parameter to force_custom_plan gets you back into business, the EXECUTE starts to work again...

The problem here is that when in plan_cache_mode=auto or plan_cache_mode=force_generic_plan we weren't called after reaching the plan cache limit with ifxPlanForeignModify() anymore, instead ifxBeginModify() is directly called, sending the whole foreign data wrapper caching initialization off into the desert. I have a feeling that this is accountable for the bug in the trigger above, too.

psoo commented 1 year ago

Fixes merged into master.

Pls note that starting from now on i do releases only at

https://github.com/psoo/informix_fdw/releases

psoo commented 10 months ago

I am closing this as the related issue seems fixed.