Closed smurthys closed 6 years ago
Our practice is for people to self-assign issues, but I have intentionally assigned this issue to new contributors so they have the opportunity to start their work on a small but concrete issue.
I believe to fix this redundancy in addLogMgmt.sql
line 106 to 127 (as highlighted) can use a derived table instead of two select statements. A possible untested solution is changing lines 106 to 127 to:
UPDATE classdb.student
--Get the total # of connections made in the imported log
--Ignore connections from an earlier date than the lastConnections
--These should already be counted
SET connectionCount = connectionCount + (
SELECT user_name_count
FROM (
SELECT COUNT(user_name) AS user_name_count, MAX(log_time AT TIME ZONE 'utc') AS LastLoginTime
FROM classdb.postgresLog pg
WHERE pg.user_name = userName
AND (pg.log_time AT TIME ZONE 'utc') > COALESCE(lastConnection, to_timestamp(0))
AND message LIKE 'connection%' --Filter out extraneous log lines
AND database_name = current_database() --Limit to log lines for current db only
) AS connectionInfo
),
--Find the latest connection date in the logs
lastConnection = COALESCE(
(
SELECT LastLoginTime
FROM connectionInfo
), lastConnection);
This possible solution creates a derived table named connectionInfo with both values needed for the SET statement. The derived table is created from the first section of the SET statement and uses the derived table in the second section to SET lastConnection.
Good start @KevinKelly25.
I propose placing a sub-query in the FROM
clause, which makes the sub-query result a derived table. (Technically, the use of sub-query as shown is not a "derived table". See Fall 2017 CS205 Slide Set 8.)
I believe using a sub-query to form a derived table exposes a slightly better strategy for the overall update query, including a slightly different approach in the sub-query itself.
I like Kevin's solution a lot however I was hoping to experiment with the WITH statement since I hadn't seen it before, so here's my stab at it.
UPDATE classdb.student
--Get the total # of connections made in the imported log
--Ignore connections from an earlier date than the lastConnections
--These should already be counted
WITH userConnection (numberConnection, lastUserConnection)
AS
(
SELECT count(user_name) AS numberConnection, MAX(log_time AT TIME ZONE 'utc') AS lastUserConnection
FROM classdb.postgresLog pg
WHERE pg.user_name = userName
AND (pg.log_time AT TIME ZONE 'utc') > COALESCE(lastConnection, to_timestamp(0))
AND message LIKE 'connection%' --Filter out extraneous log lines
AND database_name = current_database() --Limit to log lines for current db only
)
SET connectionCount = connectionCount + (SELECT numberConnection
FROM userConnection
),
lastConnection = COALESCE(SELECT lastUserConnection
FROM userConnection), lastConnection);
@baconbm The syntax of update queries permits the WITH
clause only at the beginning of the query (prior to the UPDATE
clause).
I think it will be helpful to write just the SELECT
query independently: that query will be the same with or without the WITH
clause.
Also, it will be very helpful to note the expected cardinality of the SELECT
query.
I implemented the changes we talked about in the meeting today. Let me know if you see anything that I may have overlooked
UPDATE classdb.student
--Get the total # of connections made in the imported log
--Ignore connections from an earlier date than the lastConnections
--These should already be counted
SET connectionCount = connectionCount + (
SELECT user_name_count
FROM (
SELECT user_name, COUNT(user_name) AS user_name_count, MAX(log_time AT TIME ZONE 'utc') AS LastLoginTime
FROM classdb.postgresLog pg
WHERE pg.user_name = userName
AND (pg.log_time AT TIME ZONE 'utc') > COALESCE(lastConnection, to_timestamp(0))
AND message LIKE 'connection%' --Filter out extraneous log lines
AND database_name = current_database() --Limit to log lines for current db only
GROUP BY user_name
) AS connectionInfo JOIN classdb.student ON user_name = userName
),
--Find the latest connection date in the logs
lastConnection = COALESCE(
(
SELECT LastLoginTime
FROM connectionInfo
), lastConnection);
@baconbm: I recommend creating a private gist with your solution and pasting a link to the gist in a comment here. @afig's approach is a possible model.
All: When the link to the gist is posted, please comment on the gist at the gist.
A few things to consider as you revise the query:
salesmen
is.UPDATE accounts
SET contact_first_name = first_name,
contact_last_name = last_name
FROM salesmen WHERE salesmen.id = accounts.sales_id;
I've updated a gist with an example code that should be able to address this issue
The issue I'm having is related to the timestamp part where it seems to be giving me a bogus value and I can't figure out where the issue lies. I have to head to work now so I won't be able to address fixes until tomorrow. Thanks guys.
@baconbm Just a gentle reminder that comments to gists do not cause notifications. So, be sure to manually check if there are comments to the gist.
I propose this be closed when #150 is merged, as the code reference has been squashed by other changes.
The script
addLogMgmt.sql
unnecessarily contains essentially the sameSELECT
query twice.It seems the two
SELECT
queries can be replaced with a derived table or aWITH
query.