aws / pg_tle

Framework for building trusted language extensions for PostgreSQL
Apache License 2.0
337 stars 31 forks source link

size limitation on custom type. #263

Closed thiru-baffle closed 9 months ago

thiru-baffle commented 10 months ago

Description

we have created the custom type. And attempting to update/insert data to the table of custom column via COPY command.
PSQLException: ERROR: row is too big: size 1088976, maximum size 8160

Steps to reproduce

  1. created the custom type.
  2. create a table using custom type as a column.
  3. COPY TO export the file to the table of the custom column.
  4. The data should be of size ~1 MB.

Expected outcome

data should be inserted properly.

Actual outcome

PSQLException: ERROR: row is too big: size 1088976, maximum size 8160

Analysis

same data can be converted from text -> bytea using the 'infunc' of the custom type.

If applicable, please provide logs that demonstrate the issue. Please remove any sensitive information from the logs.

adamguo0 commented 10 months ago

Hi @thiru-baffle, can you post some sample code to reproduce the issue?

adamguo0 commented 10 months ago

I believe we would need custom datatypes to support TOAST for this to work: https://www.postgresql.org/docs/current/storage-toast.html

Only certain data types support TOAST — there is no need to impose the overhead on data types that cannot produce large field values. To support TOAST, a data type must have a variable-length (varlena) representation, in which, ordinarily, the first four-byte word of any stored value contains the total length of the value in bytes (including itself). TOAST does not constrain the rest of the data type's representation.
thiru-baffle commented 10 months ago

[Uploading text.csv…]() create extension if not exists pgcrypto; create extension if not exists citext; create schema if not exists baffle_udfs; create extension if not exists pg_tle;

drop function if exists baffle_udfs.bafflectr2_in(input text) cascade; create or replace function baffle_udfs.bafflectr2_in(input text) returns bytea immutable strict language 'plpgsql' as $$ declare hex_text text; declare hex_bytes bytea; begin if starts_with(input, '\x') then hex_text := substr(input, 3); else hex_text := input; end if; hex_bytes := decode(hex_text, 'hex'); return hex_bytes; end; $$;

drop function if exists baffle_udfs.bafflectr2_out(input bytea) cascade; create or replace function baffle_udfs.bafflectr2_out(input bytea) returns text immutable strict language 'plpgsql' as $$ begin return '\x' || encode(input, 'hex'); end; $$;

do $anonfn0$ begin perform pgtle.create_shell_type('baffle_udfs', 'bafflectr2'); perform pgtle.create_base_type('baffle_udfs', 'bafflectr2', 'baffle_udfs.bafflectr2_in(text)'::regprocedure, 'baffle_udfs.bafflectr2_out(bytea)'::regprocedure, -1); end $anonfn0$;

CREATE TABLE IF NOT EXISTS text_table ( clear_test TEXT, enc_test baffle_udfs.bafflectr2);

COPY text_table FROM 'text.csv' DELIMITER ',' CSV HEADER;

The above command will produce the error: ERROR: row is too big: size 499048, maximum size 8160

The underlying type for bafflectr2 is bytea datatype. Since the bytea is of varlen, do still you recommend using any other datatype?

thiru-baffle commented 10 months ago

[Uploading text.csv…]()

lyupan commented 9 months ago

I drafted a PR(https://github.com/aws/pg_tle/pull/266) to add TOAST support so that the size limitation issue can be avoided, feel free to take a look if you are interested.

anth0nyleung commented 9 months ago

PR#266 is merged. Closing this issue.

thiru-baffle commented 8 months ago

I am still getting the same error after updating the extensions.

java.lang.AssertionError: unexpected exception: org.postgresql.util.PSQLException: ERROR: row is too big: size 1088976, maximum size 8160 Where: COPY text_table, line 1

template1=# \dx List of installed extensions Name | Version | Schema | Description
----------+---------+------------+------------------------------------------------------------ citext | 1.6 | public | data type for case-insensitive character strings pg_tle | 1.4.0 | pgtle | Trusted Language Extensions for PostgreSQL pgcrypto | 1.3 | public | cryptographic functions plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language plrust | 1.1 | plrust | plrust: A Trusted Rust procedural language for PostgreSQL (5 rows)

Are there any changes required from the client side?

adamguo0 commented 8 months ago

We introduced two optional parameters to the pgtle.create_base_type function: alignment and storage. Please try specifying storage => 'extended' in the create_base_type function call.

Doc reference: https://github.com/aws/pg_tle/blob/main/docs/09_datatypes.md#pgtlecreate_base_typetypenamespace-regnamespace-typename-name-infunc-regprocedure-outfunc-regprocedure-internallength-int4-alignment-text-default-int4-storage-text-default-plain