strange behaviour of "UPDATE" with id_encode()

dataselfservice commented 1 year ago

Hi,

I'm running a kubernetes instance of 15.4-alpine3.18, with modified image IMG_POSTGRES_TAG added with pg_hashids via following Dockerfile:

FROM postgres:15.4-alpine3.18                                                                                                                                                                                                                 

RUN apk add --no-cache --virtual .build-deps build-base postgresql-dev clang15 llvm15 ; \
    wget -qO- https://github.com/iCyberon/pg_hashids/archive/refs/tags/v1.2.1.tar.gz | tar xzf - -C /tmp && \
    make -C /tmp/pg_hashids-1.2.1 && \
    make -C /tmp/pg_hashids-1.2.1 install && \
    rm -rf /tmp/pg_hashids-1.2.1 && \
    apk del .build-deps

Deployed to the cluster with:

helm install postgres --set image.repository=postgres,image.tag=${IMG_POSTGRES_TAG},primary.service.clusterIP=${POSTGRES_CLUSTER_IP},primary.persistence.existingClaim=postgres oci://registry-1.docker.io/bitnamicharts/postgresql

The issue is that a trigger function which generates hashes during insert, miss-behaves and generates a wrong hash ending with a space.

Reproduce it by running the following SQL:

BEGIN;
DROP SCHEMA IF EXISTS test CASCADE;
CREATE SCHEMA test;
CREATE TABLE test.test (
    id bigint NOT NULL,
    data text,
    ids text,
    ids1 text,
    ids2 text
);
CREATE SEQUENCE test.test_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
ALTER SEQUENCE test.test_id_seq OWNED BY test.test.id;
ALTER TABLE ONLY test.test
    ALTER COLUMN id SET DEFAULT nextval('test.test_id_seq'::regclass);
CREATE FUNCTION test.myfunc ()
    RETURNS TRIGGER
    LANGUAGE plpgsql
    AS $$
BEGIN
    UPDATE
        test.test
    SET
        ids = id_encode (NEW.id, 'hash1', 5, 'abcdefghijklmnopqrstuvwxyz'),
        ids1 = id_encode (NEW.id, 'hash2', 12, 'abcdefghijklmnopqrstuvwxyz'),
        ids2 = id_encode (NEW.id, 'hash3', 12, 'abcdefghijklmnopqrstuvwxyz')
    WHERE
        id = NEW.id;
    RETURN new;
END;
$$;
CREATE FUNCTION test.myfunc_fix ()
    RETURNS TRIGGER
    LANGUAGE plpgsql
    AS $$
BEGIN
    UPDATE
        test.test
    SET
        ids = id_encode (NEW.id, 'hash1', 5, 'abcdefghijklmnopqrstuvwxyz')
    WHERE
        id = NEW.id;
    UPDATE
        test.test
    SET
        ids1 = id_encode (NEW.id, 'hash2', 12, 'abcdefghijklmnopqrstuvwxyz'),
        ids2 = id_encode (NEW.id, 'hash3', 12, 'abcdefghijklmnopqrstuvwxyz')
    WHERE
        id = NEW.id;
    RETURN new;
END;
$$;
CREATE TRIGGER test_myfunc
    AFTER INSERT ON test.test
    FOR EACH ROW
    EXECUTE FUNCTION test.myfunc ();
\copy test.test to stdout CSV FORCE QUOTE *;
INSERT INTO test.test (data)
    VALUES ('a'),
    ('b');
\copy test.test to stdout CSV FORCE QUOTE *;
DROP TRIGGER test_myfunc ON test.test;
CREATE TRIGGER test_myfunc_fix
    AFTER INSERT ON test.test
    FOR EACH ROW
    EXECUTE FUNCTION test.myfunc_fix ();
\copy test.test to stdout CSV FORCE QUOTE *;
INSERT INTO test.test (data)
    VALUES ('a1'),
    ('b1');
\copy test.test to stdout CSV FORCE QUOTE *;
COMMIT

Run it with: cat test.sql | psq -U testdb.

Output looks like:

WARNING:  database "testdb" has no actual collation version, but a version was recorded
BEGIN
NOTICE:  drop cascades to 3 other objects
DETAIL:  drop cascades to table test.test
drop cascades to function test.myfunc()
drop cascades to function test.myfunc_fix()
DROP SCHEMA
CREATE SCHEMA
CREATE TABLE
CREATE SEQUENCE
ALTER SEQUENCE
ALTER TABLE
CREATE FUNCTION
CREATE FUNCTION
CREATE TRIGGER
INSERT 0 2
"1","a","zmnb ","wzejgmprmnpa","yzvmbrydnogd"
"2","b","pylw ","xabdomywmenl","zmbexnawnpaq"
DROP TRIGGER
CREATE TRIGGER
"1","a","zmnb ","wzejgmprmnpa","yzvmbrydnogd"
"2","b","pylw ","xabdomywmenl","zmbexnawnpaq"
INSERT 0 2
"1","a","zmnb ","wzejgmprmnpa","yzvmbrydnogd"
"2","b","pylw ","xabdomywmenl","zmbexnawnpaq"
"3","a1","zmpdm","jlyezmrwvakd","ybjwprgzndxl"
"4","b1","dmepy","ydqlpvjrvxnz","jwmvprxmnkea"
COMMIT

The issue is the space in the end of "zmnb ". Utilizing myfunc_fix() which basically runs separated UPDATE, works i.e. generates hashes without the trailing space. But I am not at all satisfied and I would like to understand what is the root cause?!?!

I noticed the WARNING: database "testdb" has no actual collation version, but a version was recorded and I am not sure how that is realted (I do not really know what is the impact of that.

Any help or ideas?

Cheers, DataSelfService team

ImreSamu commented 1 year ago

I noticed the WARNING: database "testdb" has no actual collation version, but a version was recorded and I am not sure how that is realted (I do not really know what is the impact of that.

With the Alpine version, you probably need to use the ICU locales setting. Or you might need to switch to the simpler Debian. At least it's documented:

https://github.com/docker-library/docs/tree/master/postgres#locale-customization
- _" Also of note, Alpine-based variants starting with Postgres 15 support ICU locales. Previous Postgres versions based on alpine do not support locales; see "Character sets and locale" in the musl documentation for more details. You can set locales in the Alpine-based images with POSTGRES_INITDB_ARGS to set a different locale. ...."_

Another note: When generating or using hashes, it's advisable to use the more stable "C" Collation, which is also good for performance.

If the previous settings don't resolve the issue, then it would be advisable to report it at https://github.com/iCyberon/pg_hashids

dataselfservice commented 1 year ago

Hi @ImreSamu , thank you for the feedback, I'll make some tests and get back here.

What is best image to run into Kubernetes and is the Debian equally good for Kubernetes cluster purposes?

ImreSamu commented 1 year ago

What is best image to run into Kubernetes and is the Debian equally good for Kubernetes cluster purposes?

Unfortunately, I can't assist with that question as I don't use Kubernetes. You might find more comprehensive and helpful answers on StackOverflow: https://stackoverflow.com/search?q=postgresql+kubernetes

However, what I would recommend is never using a PostgreSQL extension in production without testing it; especially under Alpine Linux. There's a reason why the PostGIS Alpine Dockerfile includes a comprehensive test during the build process - to detect issues as early as possible.

https://github.com/postgis/docker-postgis/blob/c8f793223c/13-3.4/alpine/Dockerfile

dataselfservice commented 1 year ago

I made some test and wierd enough: the WARNING: database "testdb" has no actual collation version was something related to how the init (db creating, extension etc) was done. By cleaning the persistent volume (FS) and rebuilding (on kubernetes and on local docker) it disappeared. BUT the pg_hashid issue persists, likely unrelated to the collation stuff. I'll report to https://github.com/iCyberon/pg_hashids.

dataselfservice commented 1 year ago

additional note: the problem is exactly the same using postgres:15.4, with following Dockerfile:

FROM postgres:15.4

RUN apt update && apt install -y build-essential postgresql-server-dev-15 wget && \
    wget -qO- https://github.com/iCyberon/pg_hashids/archive/refs/tags/v1.2.1.tar.gz | tar xzf - -C /tmp && \
    make -C /tmp/pg_hashids-1.2.1 && \
    make -C /tmp/pg_hashids-1.2.1 install && \
    rm -rf /tmp/pg_hashids-1.2.1 && \
    apt purge -y build-essential postgresql-server-dev-15 && apt autoremove -y

dataselfservice commented 1 year ago

Hi,

from postgresql side, what is the difference btw myfunc and myfunc_fixin particular what is PG doing when performing multipleSET(myfunc) and oneSETat the time (myfunc_fix`)?

dataselfservice commented 1 year ago

pg_hashids was not udpated to latest. This solved the problem:

RUN apk add --no-cache --virtual .build-deps build-base postgresql-dev clang15 llvm15 ; \
    wget -qO- https://github.com/iCyberon/pg_hashids/archive/cd0e1b31d52b394a0df64079406a14a4f7387cd6.tar.gz | tar xzf - -C /tmp && \
    make -C /tmp/pg_hashids-cd0e1b31d52b394a0df64079406a14a4f7387cd6 && \
    make -C /tmp/pg_hashids-cd0e1b31d52b394a0df64079406a14a4f7387cd6 install && \
    rm -rf /tmp/pg_hashids-cd0e1b31d52b394a0df64079406a14a4f7387cd6 && \
    apk del .build-deps

docker-library / postgres

strange behaviour of "UPDATE" with id_encode() #1122