aws / pg_tle

Framework for building trusted language extensions for PostgreSQL
Apache License 2.0
333 stars 31 forks source link

size limitation on custom type. #263

Closed thiru-baffle closed 7 months ago

thiru-baffle commented 8 months ago

Description

we have created the custom type. And attempting to update/insert data to the table of custom column via COPY command.
PSQLException: ERROR: row is too big: size 1088976, maximum size 8160

Steps to reproduce

  1. created the custom type.
  2. create a table using custom type as a column.
  3. COPY TO export the file to the table of the custom column.
  4. The data should be of size ~1 MB.

Expected outcome

data should be inserted properly.

Actual outcome

PSQLException: ERROR: row is too big: size 1088976, maximum size 8160

Analysis

same data can be converted from text -> bytea using the 'infunc' of the custom type.

If applicable, please provide logs that demonstrate the issue. Please remove any sensitive information from the logs.

adamguo0 commented 8 months ago

Hi @thiru-baffle, can you post some sample code to reproduce the issue?

adamguo0 commented 8 months ago

I believe we would need custom datatypes to support TOAST for this to work: https://www.postgresql.org/docs/current/storage-toast.html

Only certain data types support TOAST — there is no need to impose the overhead on data types that cannot produce large field values. To support TOAST, a data type must have a variable-length (varlena) representation, in which, ordinarily, the first four-byte word of any stored value contains the total length of the value in bytes (including itself). TOAST does not constrain the rest of the data type's representation.
thiru-baffle commented 8 months ago

[Uploading text.csv…]() create extension if not exists pgcrypto; create extension if not exists citext; create schema if not exists baffle_udfs; create extension if not exists pg_tle;

drop function if exists baffle_udfs.bafflectr2_in(input text) cascade; create or replace function baffle_udfs.bafflectr2_in(input text) returns bytea immutable strict language 'plpgsql' as $$ declare hex_text text; declare hex_bytes bytea; begin if starts_with(input, '\x') then hex_text := substr(input, 3); else hex_text := input; end if; hex_bytes := decode(hex_text, 'hex'); return hex_bytes; end; $$;

drop function if exists baffle_udfs.bafflectr2_out(input bytea) cascade; create or replace function baffle_udfs.bafflectr2_out(input bytea) returns text immutable strict language 'plpgsql' as $$ begin return '\x' || encode(input, 'hex'); end; $$;

do $anonfn0$ begin perform pgtle.create_shell_type('baffle_udfs', 'bafflectr2'); perform pgtle.create_base_type('baffle_udfs', 'bafflectr2', 'baffle_udfs.bafflectr2_in(text)'::regprocedure, 'baffle_udfs.bafflectr2_out(bytea)'::regprocedure, -1); end $anonfn0$;

CREATE TABLE IF NOT EXISTS text_table ( clear_test TEXT, enc_test baffle_udfs.bafflectr2);

COPY text_table FROM 'text.csv' DELIMITER ',' CSV HEADER;

The above command will produce the error: ERROR: row is too big: size 499048, maximum size 8160

The underlying type for bafflectr2 is bytea datatype. Since the bytea is of varlen, do still you recommend using any other datatype?

thiru-baffle commented 8 months ago

[Uploading text.csv…]()

lyupan commented 7 months ago

I drafted a PR(https://github.com/aws/pg_tle/pull/266) to add TOAST support so that the size limitation issue can be avoided, feel free to take a look if you are interested.

anth0nyleung commented 7 months ago

PR#266 is merged. Closing this issue.

thiru-baffle commented 6 months ago

I am still getting the same error after updating the extensions.

java.lang.AssertionError: unexpected exception: org.postgresql.util.PSQLException: ERROR: row is too big: size 1088976, maximum size 8160 Where: COPY text_table, line 1

template1=# \dx List of installed extensions Name | Version | Schema | Description
----------+---------+------------+------------------------------------------------------------ citext | 1.6 | public | data type for case-insensitive character strings pg_tle | 1.4.0 | pgtle | Trusted Language Extensions for PostgreSQL pgcrypto | 1.3 | public | cryptographic functions plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language plrust | 1.1 | plrust | plrust: A Trusted Rust procedural language for PostgreSQL (5 rows)

Are there any changes required from the client side?

adamguo0 commented 6 months ago

We introduced two optional parameters to the pgtle.create_base_type function: alignment and storage. Please try specifying storage => 'extended' in the create_base_type function call.

Doc reference: https://github.com/aws/pg_tle/blob/main/docs/09_datatypes.md#pgtlecreate_base_typetypenamespace-regnamespace-typename-name-infunc-regprocedure-outfunc-regprocedure-internallength-int4-alignment-text-default-int4-storage-text-default-plain