Connection Log Import Improvements

wildtayne commented 6 years ago

This PR contains changes that address #165, #200, and #204.

General:

ClassDB.PostgresLog is now temporary (and called ImportedLogData)
Functions have been added to check the state of connection logging and the logging collector

ClassDB.importConnectionLog()

Checks for and throws a warning if log_connections or logging_collector are off
Catches missing file exceptions and throws a decipherable warning
Returns a table with each log date imported, # of connections imported on each date, and error message received when importing (if any). Example output:

I plan on updating the unit tests. It may not be possible to automatically test isLoggingCollectorEnabled(), since it requires a server restart to check both cases.

classdb200=# SELECT * FROM ClassDB.importConnectionLog('2018-05-27');
WARNING:  Log file for 2018-05-27 not found, skipping.
WARNING:  Log file for 2018-05-28 not found, skipping.
WARNING:  Log file for 2018-05-29 not found, skipping.
WARNING:  Log file for 2018-05-30 not found, skipping.
WARNING:  Log file for 2018-05-31 not found, skipping.
  logdate   | connectionslogged |                                     errormessage
------------+-------------------+---------------------------------------------------------------------------------------
 2018-05-27 |                 0 | could not open file "log/postgresql-05.27.csv" for reading: No such file or directory
 2018-05-28 |                 0 | could not open file "log/postgresql-05.28.csv" for reading: No such file or directory
 2018-05-29 |                 0 | could not open file "log/postgresql-05.29.csv" for reading: No such file or directory
 2018-05-30 |                 0 | could not open file "log/postgresql-05.30.csv" for reading: No such file or directory
 2018-05-31 |                 0 | could not open file "log/postgresql-05.31.csv" for reading: No such file or directory
 2018-06-01 |                 0 |
 2018-06-02 |                 0 |
 2018-06-03 |                 0 |
 2018-06-04 |                 7 |
 2018-06-05 |                 0 |
 2018-06-06 |                 0 |
 2018-06-07 |                 5 |
(12 rows)

smurthys commented 6 years ago

Nicely done @srrollo. The sample results you show look really helpful.

A few thoughts:

I recommend the name info or extraInfo (prefer the former) instead of errorMessage in the table importConnectionLog returns. This renaming permits us to provide information in many situations.
is* functions are missing execution grant to instructors and dbmans. EDIT: made this observation assertive
L95-L96: Recommend the warning say something like "Connection log may be missing/incomplete because log collection is OFF" and separately provide technical information. Of course, consult the Postgres guide on error message.
L100-L101: Recommend the warning say something like "Connection log may be missing/incomplete because connection logging is OFF" and separately provide technical information.
L155-L178: Indentation is inconsistent/incorrect

wildtayne commented 6 years ago

Thanks for the feedback @smurthys, I've pushed some commits that address your comments. I'm not sure if the HINT in the new warning messages is the most appropriate. It could be more direct (eg. Try setting log_connections TO 'on'), however "Managing Log Files" now contains good information about each log setting and why they need to be enabled.

smurthys commented 6 years ago

The revisions look good @srrollo.

addConnectionActivityLoggingReco.sql:

Perhaps this will be done as part of addressing #206: attribute connectionsLogged should probably be renamed numEntries in the result table of function importConnectionLog.
L163-L70: indentation is incorrect. I have verified this in the raw view.
I have a feeling the temp tables (L212-L215) should be dropped before returning result (L209-210)

testAddConnectionActivityLogging.psql:

L18-L19: Comment about user password is inconsistent with the implementation
L21-L24: Comment seems unnecessary, especially because there is really only one test performed
L32: It might be possible to use psql variable name ::USER instead of hard coding a user name
Delete L34 instead of commenting out?
When creating users, if possible, avoid passing values to parameters with default values. I have not verified if it is possible in this case, but I see no reason why any argument other than role name and full name are needed
L63-L64: Unclear why these lines are commented out. I assume later attempt to connect as that user fails. I wonder if that failure is because the user does not have a schema and has no rights to public schema. If that is the problem, it can be rectified by creating a schema with the same name as the user with authorization to that user.
L97-98: Testing for exactly 4 rows seems a bit dicey. Wouldn't the connection from the executing user also be imported? It seems better to import the log and then query a view with a filter for the test users.
L100: Raise INFO and output either FAIL: Code 1 Import Connection Log or Pass Import Connection Log.
- Raising an exception means later drops are not executed if the test fails.
- Raising INFO means the cleanup script likely not required
L104-L105 and L114-L115: I assume is leftover code and can be removed
L117: This line can be deleted if pass/fail has already been announced.
General: I notice some minor spelling/grammar errors in comments. Unfortunately I don't presently have the time to list them.

A recommended testing strategy:

Retain assertion that current user is superuser
Create users as is done presently
Make DB connections as done presently, but connect as instructor last
Perform log import, verification, and drops as instructor, because an instructor should be able drop all users (modulo the non-ClassDB user, which shouldn't be a problem if that user is granted to the instructor). This approach verifies that instructors have the privilege to import connection log.
This approach foregoes the psql variable psqlCurrentUser and reduces the number of connections needed to complete the test.

smurthys commented 6 years ago

In function importConnectionLog I suggest using the following format to inform users of potential issues in logging. This approach results in only one output message regardless of the number of parameters turned off:

Warning: log files might be missing or incomplete
Detail: The following server parameters are currently off: logging_collector, log_connections
Hint: Consult the ClassDB documentation for details on setting logging-related parameters

Here is a draft of the query to get a list of parameters turned off.

SELECT string_agg(name, ',')
FROM pg_settings
WHERE name IN ('logging_collector', 'log_connections', 'log_disconnections')
      AND setting = 'off';

wildtayne commented 6 years ago

I've pushed some commits improving the error messages and indentation in addConnectionActivityLogging.sql.

The RETURN QUERY is performed prior to the DROP TABLE statements because it needs to select from ImportResult, which is one of the tables dropped. Additionally, RETURN QUERY doesn't actually cause the function to exit, it only sets the value of the output table (see here). I've added an explicit RETURN statement at the end and some comments to clarify this usage, since it is pretty counter-intuitive.

I would prefer to use the ON COMMIT DROP property of temporary tables instead of explicitly dropping them, however this seems to not actually drop the tables when importConnectionLog() is called multiple times inside a transaction or function (until the end of the outer transaction/function, presumably).

smurthys commented 6 years ago

Thanks @srrollo for clarifying the need to drop tables after RETURN QUERY. The comments and the URL are quite helpful.

@srrollo: It will be great if you write a gist about this topic with emphasis on the use of a temp table. Such a gist will be good public service. I (and am sure others) will be happy to review the gist before publication. Not urgent, but important.

Some observations on the revised addConnectionActivityLoggingReco.sql:

The query to set disabledLogSettings does not need to be guarded. Instead, the RAISE statement that follows can be guarded by disabledLogSettings IS NOT NULL
L196-L212: Tables ImportedLogData and ImportResult should be schema qualified.
L19: Refers to function importLog, the previous name of function importConnectionLog. I recommend reviewing the script for other similar errors. For example, L81 refers to "student table"
L40: Recommend saying ...check if server parameter log_connections is set.... Likewise on L60
L79-L85: Comments need to be reviewed and revised to match the current implementation. The logic the comments describe is somewhat confusing and possibly inconsistent with the implementation
L147-L148: Incorrect comment; can be removed
L179-L183: Need to handle exception OTHERS. Presently, the exception handler just eats exceptions other than undefined_file leaving no result entry for the corresponding date. I suggest just raising a warning containing SQLERRM in this case and insert 0 count for that date.

smurthys commented 6 years ago

I recommend merging this PR before making the changes needed to import disconnections and make other related changes. I have a feeling we haven't yet fully analyzed the information needs related to disconnections, and don't want to bog down this PR unnecessarily. A new branch can address importing disconnection data.

wildtayne commented 6 years ago

I agree. I'm expecting the full scope of the disconnection changes to be fairly large

afig commented 6 years ago

Overall the work on this branch looks great. I also agree that disconnection-logging development should be completed separately. In addition to the comments already made, I have the following suggestions (I have also pointed out some minor spelling errors that haven't been addressed yet, but may have already been noticed):

addConnectionActivityLoggingReco.sql:

L56, L76: Inconsistent capitalization for schema qualification, classdb -> ClassDB
L90, L152: For consistency, we have been using INTEGER instead of INT
L103: May want to add space between disabled settings: string_agg(name, ',') -> string_agg(name, ', ')
L182/L183: Error message shown and error message logged in ImportResult do not match. Should we display SQLERRM in the initial warning, or should we store our message in ImportResult? Alternatively in the initial message we could say to check the final report (from ImportResult) for more information.
L204: lr does not seem to be a great alias for ImportRestult. Maybe ir or im?
L208: Is the GROUP BY necessary? The WHERE clause limits the results to one logDate.
L221: Typo: Explicitly
L46, L66, L104, L172: pg_settings is accessed without pg_catalog schema qualification. In other files we seem to schema qualify other system catalogs, such as pg_roles and pg_tables`.

testConnectionActivityLogging.psql:

L34: The silencing of NOTICEs is commented out, temporary? (EDIT: Upon further review this was already pointed out by in an earlier comment from @smurthys)
L62: Typo: connections
L76: Typo: connection
L103: Typo: nonexistent
L104: Typo: existent

wildtayne commented 6 years ago

Thanks for all the reviews. I believe I have addressed all of of the comments in addConnectionActivityLogging.sql . @afig I have kept the warning message for missing log files as-is, however. I think this makes sense, because the warning gives an easier to understand error, while the report has the details.

wildtayne commented 6 years ago

Huge thanks for the :USER tip @smurthys, it allows for a big improvement to the test script.

I just pushed several commits that greatly improve the test script. Changes to the procedure are as follows:

:USER is used to store the current user in a psql variable
Any leftover test users are removed. This removes the need for the cleanup script. I feel it is still necessary to have this step, because it is easy to get stuck in a situation where the test users already exist if a password is miss-typed.
User connections are AUTOMATICALLY :wink: switched using \setenv PGPASSWORD
testins01 runs the test of `ClassDB.importConnectionLog(). These changes are rolled back
The psql var is used to switch back to the original user

I still have some reservations on it I would like to get feedback on:

The test makes changes to ClassDB.ConnectionActivity that cannot be rolled back. This is because the test must compare the state of this table before and after the connections happen. The 'after' changes are currently rolled back, but if one runs the test again, these changes will be imported during the 'before' stage.
The test still checks for exactly four connections in the return table from ClassDB.importConnectionLog(). I understand the issues brought up with this implementation, but I believe that this method is currently best. The issue with looking directly at ClassDB.ConnectionActivity with a filter for the test users is that connections from previous test runs could still be imported. Thus, we would still have to filter the connections by time as well, and I don't know of a good way to save the start time of the test (but am open to suggestions). Plus, the current method tests the return table from ClassDB.importConnectionLog()
There are some things the script does not test, notably importing a missing log. The issue here is picking a log file that is guaranteed to be missing (and I don't think this is currently possible)

I don't think any of these issues are critical, but they should be kept in mind.

wildtayne commented 6 years ago

I just pushed some more commits updating the tests. I've added a dynamic function that checks ClassDB.ConnectionActivity for correct entries after the import. The dynamic creation allows the timestamp at test start time be hard-coded in the function. This allows the start time to persist across the connection changes. I also added a similar stub that can be used to check disconnection entries, when they are implemented.

A few notes:

The functions are placed in ClassDB, as temporary functions will not persist across connection changes.
Dollar quote notation is used for the dynamic function bodies, which allows them to retain syntax highlighting and removes the need for escaping single quotes. For this usage, the tag between the two $ is required

smurthys commented 6 years ago

Awesome changes @srrollo. I super like the use of \setenv PGPASSWORD. It is amazing how much we (can) learn everyday.

The dynamic functions are neat. I wonder if the testing user can instead just create a table with one column containing the test start time, and use a join query later to verify.

Not that we need to address it in code, but L92 has an interesting requirement: the entire test must be completed on the same calendar day. 😱 . I suppose we can just remember to start this test at least 5 minutes before mid-night. 😄

I am not quite sure if it is necessary to switch the connection back to the testing user at the end, because an instructor should be able to drop all students, and also other objects with appropriate grants earlier.

Lastly, I think it is OK to assume all tests are single user, and quite likely run in a dedicated DB: the installation scripts make it really easy to do that.

smurthys commented 6 years ago

I have been thinking about the following alternative for a couple of hours now and think it can work to simplify the verification step. The idea hinges on using "application name" in each test connection as I just described in a comment elsewhere.

The idea is simple: specify a "unique" string as the application name in each test connection and then verify that ConnectionActivity table has one and only one entry for each test connection tried.

The proposed approach eliminates the need to communicate/retain a "nexus" table or a dynamic functions across connections.

BTW: I don't know if application name shows up in disconnection entries, but that should not be a problem if session_id is also captured (which will be the subject of a different PR).

wildtayne commented 6 years ago

I think the application name idea is perfect, unfortunately, after rather thorough testing it appears that application_name is not set until after the connection is made (so it cannot be logged in the connection log row, I will elaborate in #206). I have an alternate idea that builds off using application_name, which I will elaborate on in #206. Since this solution will rely on implementing both tests h session id and disconnection logging, I propose removing the two tests implemented using dynamic queries, then merging this PR. I will then implement session id/disconnect logging witin a new branch.

I agree the CURRENT_DATE requirement is a little funky, however I think it makes the test more predictable. Otherwise, it is possible running the test could import a large number of logs. Additionally, running the test while ClassDB.ConnectionActivity is empty is effectively the same as limiting the import to CURRENT_DATE.

I ended up having the original user drop all the roles because the instructor can't drop itself. I figured it was easier to just drop everything in one shot.

smurthys commented 6 years ago

Too bad application name does not show in connection entries. With this knowledge, I restate my earlier proposal of replacing the dynamic functions with a table the testing user creates.

wildtayne commented 6 years ago

I've pushed a commit that uses a table for test start time storage instead of dynamic functions.

afig commented 6 years ago

Progress so far looks really good. I did notice one small issue when running testConnectionActivityLogging.psql, specifically the second time I ran it.

I was getting FAIL: Code 2 whenever the ClassDB.ConnectionActivity table had recent entries, including when the test script was run more than once. This seems to be caused by an unnecessary timezone conversion on L127. Specifically, when CURRENT_TIMESTAMP (whose datatype is TIMESTAMP WITH TIMEZONE) is inserted into the TestConnectionActivtiyLogging table, since the data type of the column is TIMESTAMP (without timezone), it is stored in local time, rather than UTC, making the conversion on L127 incorrect.

I also noticed a tiny issue in a comment in addConnectionActivityLoggingReco.sql:

L19: There seems to be an extra space between the function name and parentheses

wildtayne commented 6 years ago

Really good catch @afig. A just pushed a commit fixing this issue and cleaning up some comments.

smurthys commented 6 years ago

A couple of quick observations on the revised testConnectionActivityLogging.psql:

Functions testImportConnectionLog and testConnectionLogging don't seem necessary. Their code can be simplified and embedded in the verification block that begins on L179. Even if those functions are desired, should they be persistent functions? The persistent table already communicates information across connections expressly for the purpose of avoiding persistent functions.
The query in testConnectionLogging is not reliable because there are many reasons to get a count of 4. The better approach is to group output by username and limit to groups with a count of more than 1. The test fails if any rows are still output.

wildtayne commented 6 years ago

Updated tests based on @smurthys suggestions

smurthys commented 6 years ago

Quickly: L139 should just say SELECT *. Selecting COUNT(*) will always return a row.

wildtayne commented 6 years ago

Just pushed a commit fixing this, had to SELECT UserName because of the GROUP BY.

smurthys commented 6 years ago

Thanks @srrollo for the revisions. I think the comment preceding the group-by query needs to be more specific about the reason for the query.

wildtayne commented 6 years ago

Thanks for your continued feedback @smurthys. I just updated the comment on that query.

afig commented 6 years ago

I like the changes to test 2. However, the comment seems to be incorrect (or perhaps I am misunderstanding the query). It seems like the query is only testing if there are too many rows, rather than also testing to see if there are too few (if there are 0 entries, then they still will not be SELECTed).

wildtayne commented 6 years ago

Thanks for the catch. The comment is incorrect ATM, I meant to change it last commit. However, I realized the real issue is since each user only has one connection, a user with too few connections imported won't be selected at all, regardless of the HAVING filter. So I will probably fix this by doing multiple connections for each user, and change to HAVING <> #connections.

smurthys commented 6 years ago

I think multiple connections are not necessary. Instead, we can use another IF statement to check if connection records are missing for any user and simply raise fail code 3.

The grouping query finds users who have more connection entries than expected: we expect one and exactly one entry for each user.

wildtayne commented 6 years ago

I just made a few comment improvements and added two new tests:

Checks that each ClassDB user has new associated activity rows added
Checks that the non-ClassDB user has no new associated rows added

smurthys commented 6 years ago

May be it is just formatting, but I am bit unsure about the query in L162-190.

Also, a simpler outer join can find users with missing connection entries. The following starter query works.

Lastly, I just realize the test is missing a check that no connection entry exists for the non-classDB user.

WITH ClassDBUser AS 
( SELECT * FROM regexp_split_to_table('constu01 constu02 conins01 condbm01', E'\\s+') AS UserName)
SELECT U.UserName 
FROM ClassDBUser U LEFT OUTER JOIN ClassDB.ConnectionActivity C ON U.UserName = C.UserName
WHERE C.UserName IS NULL
      AND TRUE --change to filter on timestamp

wildtayne commented 6 years ago

Thanks, that is a much simpler query. I was trying to think of a way to get the usernames into a table, I forgot about regexp_split. EDIT: I've pushed a commit implementing the query.

Also, I believe L193-206 checks if the non-ClassDB user has any connection entries, unless I am misunderstanding.

smurthys commented 6 years ago

Yes, it is rather easy to forget functions such as regexp_split. In any case, I feel the simplicity comes from the outer join, because the CTE can easily be defined as a simple UNION query:

SELECT 'a' UNION SELECT 'b' UNION SELECT 'c' UNION SELECT 'd'

BTW, the tests in L50-L67 could be simplified using ClassDB functions isUser, and isServerRoleDefined.

EDIT: Functions isStudent, isInstructor, etc. are also available.

smurthys commented 6 years ago

@srrollo: Sorry I didn't acknowledge this earlier. Yes, the test for non-ClassDB role existed. I had missed it.

wildtayne commented 6 years ago

Thanks for reminding me about the ClassDB user functions @smurthys. I've integrated these into the tests. I've also improved a few comments.

I ended up removing the CURRENT_DATE parameter from importConnectionLog() in the tests, as I don't think it is really necessary after thinking about it. This leaves the Midnight edge case only when running the test with an empty ClassDB.ConnectionActivity.

wildtayne commented 6 years ago

I'll leave this open a bit longer if anyone else wants to review. I will probably merge at ~3:30 when I get home in the interest of time however.

wildtayne commented 6 years ago

Thanks for the reviews everyone. :+1:

DASSL / ClassDB

Connection Log Import Improvements #214