Closed wildtayne closed 6 years ago
Currently, ClassDB.ConnectionActivity
only records the date/and user for the corresponding connection. If disconnections are placed in this table, all derived functions/views will displays them as connections as well. I see three options:
DisconnectionActivity
table and update all corresponding viewsActivityType
column or such in ConnectionActivity
that specifies if the row is a connection or disconnection. Then update all corresponding viewsI believe option 1 requires relatively little effort, while options 2 and 3 require more effort and are about equal to one another.
I think Option 3 in @srrollo's list is better and recommend altering table ConnectionActivity
as follows:
ActivityType CHAR(1) DEFAULT 'C' CHECK (ActivityType IN ('C', 'D'));
AcceptedAtUTC
to ActivityAtUTC
I also recommend changing the script to alter the table if it already exists, and create if it does not exist. This approach retains existing data.
Actually, I am curious about Option 1 in @srrollo's list. What does "Treat disconnections as connections" mean? I think it will help to see a log entry/record disconnection.
Also, we might need to add session_id
to ConnectionActivity
table so we can match disconnections with connections. Unless, there is another method to match.
Option 1 is the essentially the simplest possible retrofit: all tables and views are left as they are now. importConnectionLog inserts a new row into ConnectionActivity when it reads connect or disconnect from a log line. The result of option 1 is not very polished, but it could serve as a stopgap if we want to address this issue in two parts: first we enable disconnection logging and implement importing, then we update the data design and views.
I agree that that option 3 produces the best final result, and the proposed changes by smurthys are good. I will look into using session_id . I also suspect that will be the best way to match connections and disconnections.
I also think it would be helpful to define what we want to do with logged disconnections. Some possible uses:
I also side with implementing option 3, including using the suggested implementation.
If I understand it correctly, option 1 effectively does not distinguish between connections and disconnections and considers them both as "connection activity".
It would be interesting to see what kind of summary data could be created. However, I am not sure if using logged connections/disconnections is the best way to determine who is presently connected to the DBMS, since one discrepancy in logging (e.g. a power outage) will lead to incorrect information. There also exist more direct ways to accessing this information (see implementation of ClassDB.listUserConnections()
)
As mentioned in #214, I believe it is not possible for Postgres to log application_name
in the connection log rows. It appears that after the connection is accepted, SET application_name TO <name>
(or something analogous) is performed internally. The upshot of this is that disconnections do log application_name
, so my proposal is this:
ClassDB.ConnectionActivity
sessionID
is added to ClassDB.ConnectionActivity
. I need to double check, but I believe the session ids are alphanumeric and fixed length, so VARCHAR(X)
should be OK.applicationName
is added to ClassDB.ConnectionActivity
. I think there is a max length for this documented.sessionID
for all rows added to ClassDB.ConnectionActivity
applicationName
- should always be NULL
for connectionsactivityType
based on the log message (connect or disconnect)SELECTS
every connection rows with a sessionID
matching a newly imported disconnection, and updates the connection row's applicationName
.I believe the will reliably record all three attributes for every activity row. The one downside is that new connections will not have an associated applicationName
until they disconnect. However, as @afig mentioned, it is more appropriate to use pg_stat_activity
and our derived views to get info about currently connected users.
I would really appreciate feedback on this approach.
VARCHAR(9)
would work:
The %c escape prints a quasi-unique session identifier, consisting of two 4-byte hexadecimal numbers (without leading zeros) separated by a dot. The numbers are the process start time and the process ID... 19.8. Error Reporting and Logging
NAMEDATALEN
, which is usually 64 characters (like how we deal with SQL identifiers, we can assume it is set to its default, with corresponding comments about our assumption).NULL
for connections? I don't see much benefit in doing soThanks for the comments @afig. Having the exact values for those columns is helpful.
applicationName
is NULL
, however I believe for the purposes of ClassDB we can make this assumption. I have found use cases where application_name
is changed at runtime to make it easier to monitor various portions of complex scripts, but I don't think this is much of a concern for ClassDB. We may want to look into disallowing students from using SET application_name
, however.RETURNING
). So any open connections will just be left with a NULL
applicationName
. We may want to document that information on open connections is 'incomplete' however.We would also need to keep this in mind for the test scripts. For example, testConnectionActivityLogging.psql
currently runs import as a test instructor, and counts that connection as one expected to be found. if we use @smurthys suggestion of a custom applicationName
to filter new connection records, then this connection would be missed due to applicationName
being NULL
until disconnect.
I agree there is no reason to assert app name. For both connection and disconnection messages, simply import whatever is in the app name column.
To reduce development/testing effort, at this point, I don't think it is necessary to patch app name in connection rows from disconnection rows, but it is OK to do so though as @afig says the patching is not effective for a connection that doesn't (yet) have a corresponding disconnection.
BTW, I am curious to know if there are instances of connection and/or disconnection log entries with more than one line, and if perhaps app name is set in the 2nd line or later for connection entries. Just wondering.
There are usually 2-3 lines per connection, none of which have application_name set. There is only 1 disconnection line, which does have application_name set.
Edit: Also, as can be seen in the sample logs I posted, the connection authorized
line is the one which signifies the connection has open successfully.
Actually, based on the docs @afig linked, SessionID
should probably be VARCHAR(17)
, since it consists of two 4-byte hexadecimal numbers without leading zeroes. We're generally seeing the PID
part of SessionID
be 3 digits because most systems max out PID
at 32768 (and don't have a ton of processes running), but it is possible to configure a system with a greater range.
I just encountered a bit of an issue with the upgrade ALTER TABLE
statements for ClassDB.ConnectionActivity
. I was thinking SessionID
needs to be NOT NULL
. However when you add this to a table with existing rows, you get an error because SessionID
is NULL
in the existing rows. I was thinking maybe we could add a query that sets Rather, we could add a temporary default of SessionID
to something like 00000000.00000000
where SessionID
is NULL
. This value should never be encountered "in the wild", because the timestamp part is stored as epoch time.00000000.00000000
and then DROP
it immediately.
Indeed the DBMS would not permit a new NOT NULL
without a default non-NULL
value. Thus, I agree the only alternatives are to permit NULL
or insert a safe default value. However, I am not sure I understand the bit about "DROP
it immediately" in @srrollo's comment.
BTW, I don't readily see any problem permitting NULL
in the SessionID
column because the table can be populated only by the ClassDB
role. Am I missing something?
I will leave here a starter-set of queries for those wanting to experiment.
CREATE TABLE t(a INTEGER);
INSERT INTO t VALUES (5), (7), (8);
ALTER TABLE t ADD COLUMN b VARCHAR NOT NULL; --ERROR: column "b" contains null values
ALTER TABLE t ADD COLUMN b VARCHAR NOT NULL DEFAULT 'xyz'; --works
The solution I though of would be the following. Add the column with a default initially, which populates all the existing rows with the default value. Then, drop the default constraint.
--Set a temporary default to add a value to existing rows (because of NOT NULL)
ALTER TABLE IF EXISTS ClassDB.ConnectionActivity
ADD COLUMN IF NOT EXISTS SessionID VARCHAR(17) NOT NULL DEFAULT '00000000.00000000';
--Drop the temporary default
ALTER TABLE IF EXISTS ClassDB.ConnectionActivity
ALTER COLUMN SessionID DROP DEFAULT;
I suppose SessionID
could be NULL
, however that is inconsistent with the other attributes. All are NOT NULL
except for ApplicationName
(because it is expected to be NULL
sometimes). Additionally, if SessionID
is ever NULL
, something is definitely wrong with either the import or log format.
Thanks @srrollo for clarifying your thought on the drop action at end. I now understand your idea is to fill a default value only for connection entries already in the table. 👍
BTW, I assume these alterations are done only if the column did not already exist. Perhaps obvious, but just stating.
Also, I wonder if a wiki page in the repo, or a wiki tab in the team, outlining the solution is a better idea than to incrementally designing the solution here.
Connection logging should use
log_disconnections
to log disconnects.log_disconnections
toon
does not require a server restart, but it only effects sessions started after it is set.