Closed loganlinn closed 11 years ago
What about something like this:
CREATE OR REPLACE FUNCTION mget(_values poky[]) RETURNS SETOF poky AS
$$
DECLARE
BEGIN
RETURN QUERY
SELECT poky.*
FROM poky
JOIN (
SELECT * FROM UNNEST(_values)
) AS in_poky ON (
poky.bucket = in_poky.bucket AND
poky.key = in_poky.key AND
date_trunc('seconds', poky.modified_at) = date_trunc('seconds', in_poky.modified_at)
);
END;
$$ LANGUAGE PLPGSQL;
It can then be queried using:
SELECT * FROM mget(ARRAY[('bucket', 'key100', NULL, NULL, '2013-05-18 16:43:23.361331+00')::poky]);
It has the odd NULL syntax on input but the body of the function is declarative SQL. We may have to iterate on the structure of the query to make sure the planner is doing what we expect but I think doing so will be very straight forward.
Yeah, I was looking into doing something of this direction w/ the ROW
constructor, but didn't get too far. Do you know if this has an perf advantage over using EXECUTE? I can see how it would...
SELECT * FROM mget(ARRAY[('bucket', 'key100', NULL, NULL, '2013-05-18 16:43:23.361331+00')::poky]);
Not crazy that this apparoach requires knowledge of the column order in schema
Not crazy that this apparoach requires knowledge of the column order in schema
We could probably get around that by constructing an array of poky
records given an array of (bucket, key, modified_at)
tuples.
Something like this would work:
CREATE TYPE mget_row AS ( bucket text, key text, modified_at timestamptz );
CREATE OR REPLACE FUNCTION mget(_values mget_row[]) RETURNS SETOF poky AS
...
;
SELECT * FROM mget(ARRAY[('bucket', 'key100', '2013-05-18 16:43:23.361331+00')::mget_row]);
Are you of the mindset that knowledge modified_at
should be required in order to use this interface? Although it fits our current use-cases, I think it significantly hinders its utility.
Are you of the mindset that knowledge modified_at should be required in order to use this interface? Although it fits our current use-cases, I think it significantly hinders its utility.
We could accept a NULL for the modified_at value. In which case any modified_at will do. What do you think about that approach?
Sounds good. Would you want to implement, or should I?
Hrm. I did some testing and for some reason, the stored procedure is ~20x slower than the query. Investigating.
Ok, writing the function as SQL is a lot more performant. Here is the version I currently have:
CREATE OR REPLACE FUNCTION mget(_bucket text, _values mget_param_row[]) RETURNS TABLE
(
bucket varchar(256),
key varchar(1024),
data text,
created_at timestamptz,
modified_at timestamptz
) AS
$$
WITH mget_poky AS (
SELECT * FROM UNNEST(_values)
),
poky AS (
SELECT *
FROM poky
WHERE bucket = _bucket AND
key IN (SELECT key FROM mget_poky)
)
SELECT poky.bucket, poky.key, poky.data, poky.created_at, poky.modified_at
FROM poky
JOIN mget_poky ON (
mget_poky.key = poky.key AND
(( mget_poky.modified_at IS NULL) OR date_trunc('seconds', poky.modified_at) = date_trunc('seconds', mget_poky.modified_at) )
);
$$ LANGUAGE SQL STABLE CALLED ON NULL INPUT;
It's about 20x faster than the plpgsql version (over an iteration of 1000).
I can take a stab at adding it. I also want to try and productize the partitioning.
Thanks for the SQL sproc... I like that much better
Instead of generating the query JVM-side, use a stored procedure for the mget query.
Primary motivation for these changes is to use a IN/ANY super-select for the query for performance reasons. There were concerns about complexity of generating queries, so this generates a stored procedure call instead (an arguable trade-off). A benefit of having this as a sproc is it can be used from different clients as well as better access for load testing.
Timestamped queries aren't working quite right yet (where parameter is Timestamp object). I am assuming this is JDBC type inference issue.