Closed adamprescott closed 1 year ago
Hello, interesting feature, but it may require a bit of refactoring first. I never used Redshift, could you please show me what is your plan so I can have an idea (what queries are needed, what needs to be modified, ect...) ? Thanks
Sure here are a few things I've identified:
pg_database_info
) - though there are mitigations around this via other tooling.select version()
and seeing if .includes('Redshift')
pg_stat_activity
looks slightly different in Redshift, with columns datid, datname, procpid, usesysid, usename, current_query, query_start
current_query and query_start aren't usually populated unless configured on. We're also missing application_name
pg_terminate_backend
is a valid means of killing idle sessionsExample query for identifying idle connections, the main issue with this query is it can take 7 secs~:
with sessions as (SELECT starttime, process, u.usesysid, user_name
FROM stv_sessions s, pg_user u
WHERE s.user_name = u.usename
AND u.usesysid>1
AND process NOT IN (
SELECT pid FROM stv_inflight WHERE userid>1
UNION SELECT pid FROM stv_recents WHERE status != 'Done' AND userid>1
)), queries as (
SELECT userid, pid, max(endtime) AS last_end FROM svl_statementtext
WHERE userid>1 AND sequence=0
GROUP BY 1,2)
SELECT
sessions.process AS pid,
trim(sessions.user_name) AS user_name,
sessions.starttime AS session_start_time,
datediff(s, sessions.starttime, sysdate) AS session_duration_is_seconds,
queries.last_end AS last_query_end_time,
datediff(s, CASE WHEN queries.last_end IS NOT NULL
THEN queries.last_end
ELSE sessions.starttime END
, sysdate) AS idle_duration_in_seconds
FROM sessions
LEFT OUTER JOIN queries
ON sessions.usesysid = queries.userid AND sessions.process = queries.pid
WHERE (queries.last_end > sessions.starttime OR queries.last_end IS NULL)
ORDER BY idle_duration_in_seconds DESC;
I think there are two viable options:
serverless-redshift
project and make the changes needed to the queriesOk, I think we could proceed with the option 2. I will try to assess the viability of abstracting the queries, if it is too complex or tedious we can proceed with the first option.
Can infer if we're connecting to a Redshift DB by running select version() and seeing if .includes('Redshift')
I guess for this we could just add another option on the config, for example compatibility: 'Redshift'
.
@adamprescott I have made a draft PR, please take a look at it: https://github.com/MatteoGioioso/serverless-pg/pull/81
This is the proposed interface:
interface Plugin {
getIdleProcessesListByMinimumTimeout(self: ServerlessClient): Promise<NodePgClientResponse<ProcessList>>;
getIdleProcessesListOrderByDate(self: ServerlessClient): Promise<NodePgClientResponse<ProcessList>>;
processCount(self: ServerlessClient): Promise<NodePgClientResponse<Count>>;
killProcesses(self: ServerlessClient, pids: string[]): Promise<NodePgClientResponse<any>>;
showMaxConnections(self: ServerlessClient): Promise<NodePgClientResponse<MaxConnections>>;
}
Which you could use it like this:
const client = new ServerlessClient({
user: "redshift",
host: "localhost",
database: "redshift",
password: "redshift",
port: 5439
plugin: new RedshiftServerlessPlugin(someObject);
});
this way you have complete freedom in what parameters you would like to give to your query.
You can also tap into any configuration values using the self
parameter.
Example function:
async getIdleProcessesListByMinimumTimeout(serverlessPgSelf) {
const query = `
WITH processes AS (SELECT EXTRACT(EPOCH FROM (Now() - state_change)) AS idle_time,
pid
FROM pg_stat_activity
WHERE usename = $1
AND datname = $2
AND state = 'idle'
AND application_name = $5)
SELECT pid
FROM processes
WHERE idle_time > $3
LIMIT $4;`;
const values = [
serverlessPgSelf._client.user,
serverlessPgSelf._client.database,
serverlessPgSelf._strategy.minConnIdleTimeSec,
serverlessPgSelf._strategy.maxIdleConnectionsToKill,
serverlessPgSelf._application_name
];
return serverlessPgSelf._client.query(query, values);
}
Let me know what you think, Thanks!
EDITED
Hi, Currently looking at using this library to also connect to Redshift from Lambda, Redshift is wire compatible with Postgres though a few of the system tables have slightly different and missing columns compared to what this library expects, AWS also add other tables which can be used to infer idle sessions.
If no one else is considering this, or working on it, I'm happy to add a PR for this feature/enhancement.
Mainly pg_stat_activity doesn't have all the columns this library expects and there are different ways to identify idle sessions in Redshift.