How can we programatically terminate a session?

atc0005 commented 4 years ago

Blocking a user account can be done by adding the account to a specific flat-file referenced by EZproxy, but historically we've had to login to the admin panel to terminate a session. Is there an API we can use? If we can't kill existing sessions, perhaps we can match against an IP Address and block that instead?

References:

atc0005 commented 4 years ago

The following comments were pulled from an internal issue I opened elsewhere to log some thoughts. Transferring here as they're relevant and (AFAICT) not particularly sensitive in nature.

EDIT: As of this writing, I have yet to hear back from OCLC Support. I have to assume that the outcome will be that we're on our own to come up with a solution.

atc0005 commented 4 years ago

Does blocking an IP require restarting EZproxy?

Perhaps we can write to a config file to block further sessions, but match existing sessions to their IPs and block those IPs?

atc0005 commented 4 years ago

https://help.oclc.org/Library_Management/EZproxy/EZproxy_configuration/Set_limits_for_your_institution

Determines how long in minutes an EZproxy session should remain valid after the last time it is accessed. The default of 120 determines that a session remains valid until 2 hours after the last time the user accesses a database through EZproxy. MaxLifetime is the only setting that is position dependent in config.txt. In normal use, it should appear before the first TITLE line.

If nothing else, the MaxLifetime value could be tweaked further to increase the likelihood that the session times out and is subject to the same block as other abusers.

atc0005 commented 4 years ago

https://help.oclc.org/Library_Management/EZproxy/Configure_resources/RejectIP

Perhaps have the main EZproxy config file pull in an include file of rejected IP Address entries?

The logic could perhaps check to see when a user was blocked and if it was X minutes past, add a RejectIP entry in order to force the user session to terminate. After X minutes, the IP could be removed and the blocked/disabled user account entry would serve to prevent repeat abuse for the existing account. The temporary IP block would limit potential Denial of Service to legitimate users of the system.

atc0005 commented 4 years ago

Workflow (scratch notes, some logic debugging needed):

EZproxy logs activity
Splunk ingests updates to general log messages file
Splunk ingests updates to disabled user file
Splunk thresholds for general usage tripped, alert submitted via JSON payload
brick web app processes request, writes out user account to disabled users file
brick web app records the event metadata, likely to a local database
EZproxy sees update to file, blocks new logins for specified user account
- existing sessions are unaffected at this point
Splunk ingests updates to general log messages file
Splunk ingests updates to disabled user file
Splunk thresholds for disabled user activity tripped, alert submitted via JSON payload
- this would be for activity associated with user accounts in the disabled file X minutes past the ingest time
- potentially submitted to a different endpoint on brick web app since the goal/logic would be different than for the initial block alert payloads
brick web app writes associated IP Address of offending user to temporary.blocked.ips.txt (or whatever name)
- this event could occur multiple times for each offending IP Address up to Limit (EZproxy setting for maximum concurrent sessions per account) times, resulting in Limit number of temporary IP blocks
brick records the block somewhere (likely a local database)
brick web app restarts EZproxy (if necessary)
EZproxy sees new IP rejection config settings, blocks all connections from that IP
- TODO: Does this also trigger intruder logic? If so, how would we clear intruder blocks once we remove an explicit rejection?
Splunk ingests blocked IP Address include file update
- this could be useful for Network Security team, or us if we need to check against our other systems
EZproxy: All existing user sessions for blocked IP timeout are closed/expired
brick (on an internal timer) re-evaluates blocked IP entries and disables them after a set time
- brick records this event also

At this point legitimate users (other than the blocked user account) are mostly unaffected as we no longer have temporary IP blocks for the specific user account in place.

atc0005 commented 4 years ago

@auadamw How does the workflow in https://github.com/atc0005/brick/issues/13#issuecomment-623075903 sound?

We would potentially need to ingest +1 more files in order to handle this, and would probably want to add an ingest for one more file past that (so +2 overall).

We might also drop the MaxLifetime value further than we already have it (not mentioning that specific detail here) to help with this as well, though if we get the logic working as specified in https://github.com/atc0005/brick/issues/13#issuecomment-623075903 it wouldn't matter as much if we use a temporary IP block to force session expiration.

The assumption there is that the bulk of the unauthorized behavior would occur during early morning or off-hours and any IP block triggered would minimally impact legitimate users.

atc0005 commented 4 years ago

I gave this some more thought and I think we can use an existing tool to lighten the development time/costs for the initial implementation: fail2ban.

Modified workflow below.

EZproxy is running and logging usage activity
Splunk agent is running and monitoring for updates
- looking for and ingesting any new updates to general log messages file
- looking for and ingesting any new updates to disabled user file
brick web app is running with several log files open
- log file for user account block requests
- log file for IP Address block requests
fail2ban is running and monitoring brick web app log file for IP Address block requests
Splunk server thresholds for general usage tripped (based on general log messages file), alert submitted via JSON payload
brick web app processes request, writes out user account to disabled users file
brick web app records the event to a local log file
EZproxy sees update to file, blocks new logins for specified user account
- existing sessions are unaffected at this point
Splunk agent ingests updates to general log messages file
Splunk agent ingests updates to disabled user file
Splunk server thresholds for disabled user activity tripped, alert submitted via JSON payload
- this would be for activity associated with user accounts in the disabled file X minutes past the ingest time
- potentially submitted to a different endpoint on brick web app since the goal/logic would be different than for the initial block alert payloads
brick web app writes log message to IP Address block requests log file
- this event could occur multiple times for each offending IP Address up to Limit (EZproxy setting for maximum concurrent sessions per account) times, resulting in Limit number of IP Address block requests from Splunk server
fail2ban sees each new entry in IP Address block requests log file
fail2ban blocks each offending IP Address for a configured amount of time slightly greater than the MaxSession limit in EZproxy
Splunk agent ingests brick web app IP Address block requests log file (optional)
- this could be useful for Network Security team; the fact that we opted to block the IP is an event they're probably interested in tracking
- this could also prove useful if we need to check against our other systems for related activity
EZproxy: All existing user sessions for blocked IP timeout are closed/expired
fail2ban expires the temporary IP block
At this point legitimate users (other than the blocked user account) are mostly unaffected as we no longer have temporary IP blocks for the specific user account in place.

This allows the brick web app to focus exclusively on:

parsing Splunk "server" requests
writing disabled user file entries
writing IP Address block request log entries

Direct/measurable upsides:

brick web app
- reduces the complexity of the brick web application (quite a bit)
- should help with debugging problems
EZproxy
- does not require modifying "main" config settings in EZproxy
- does not require restarting EZproxy to block IPs
  - an automated restart based on and combined with config-level changes is risky
fail2ban
- not reinventing existing/available functionality
- well known (e.g., support from community)
- time-tested
- local familiarity (I've worked with it off/on for several years now)

atc0005 commented 4 years ago

Further refinement:

Splunk ingests only the traffic logs like it is doing now via local agent
Splunk sends alerts via JSON payloads to a single endpoint like we've discussed previously
- no need for a second endpoint, second alert or second ingest (though this might still be useful for a later point in time)
brick logs both the block request and the IP Address (two separate files) each time a payload is delivered by Splunk
fail2ban blocks the logged IP Address for MaxLifetime + some small additional amount of time

This workflow should allow for the live sessions to timeout, the user account to be blocked and new sessions to be blocked.

atc0005 commented 4 years ago

I missed a response from OCLC Support on April 28th (just found it). Snippet of that response (leaving out the support tech's information):

Thank you for contacting OCLC Product Support.

Unfortunately, EZproxy does not currently have the mechanism to terminate sessions with given attributes. There is only limitations on session duration and user permissions based on your authentication method.

To secure your EZproxy server from suspicious login activity, you may set IntruderIPAttempts. This will set any restrictions based on the directive definitions when logging in. For more details, see the following documentation: https://help.oclc.org/Library_Management/EZproxy/Configure_resources/IntruderIPAttempts

Let me know if I can be of any further assistance on this request.

My response back:

Hi,

Sorry for my late response (I just found this email).

We're already using the IntruderIPAttempts directive to block user accounts based on thresholds that seem to work fairly well for the most egregious abuse (bots, "rage" logins when users forget passwords, etc).

Our use case in this discussion is tied to an external indicator that an account has been abused, so we're looking to shutdown a specific user account based on external tooling. After some research, it is beginning to look our solution will be to (automatically) block the associated IP at the host firewall level long enough for the EZproxy MaxLifetime value to be reached and force the session to timeout. That host firewall entry could then be (automatically) unblocked a short time later. Combined with adding the associated user account to a flat-file (or AD group or ...), this should prevent new sessions for the associated account.

Any major flaws in that plan?

atc0005 commented 4 years ago

Leaving out some emails in the thread, but received one back today confirming that the recommended RejectIP directive does require restarting EZproxy in order to take effect. That makes that option less desirable due to that requirement (e.g., if we accidentally introduce a config error we could bring down the service).

atc0005 commented 4 years ago

brick web app is running with several log files open

log file for user account block requests

log file for IP Address block requests

Note to self: Unfortunately I don't recall why I suggested two log files; fail2ban is perfectly capable of (and is intended for) monitoring log files based on patterns. We log to one log file and fail2ban parses the entries looking for a pattern that it has been configured to take action on. When it finds the pattern, it extracts the IP Address and blocks it (temporarily, or "permanently") as indicated previously.

At this point I'm trying to figure out exactly how the log messages will be recorded. I've got the templates setup, I figured out when logging occurs, but now I'm trying to figure out when fail2ban will be triggered.

Some misc notes below as I think out loud.

Every payload received ...

is intended as a "block this user" request from Splunk
is logged with an indicator that the user was reported
is logged a second time indicating whether an action was taken to disable the user account or ignore it or the associated IP Address (based on the presence in an ignore file)

It is tempting to have fail2ban look at the reported log entries, but that removes the ability to ignore specific usernames or IP Addresses and still log that the payload was received.

atc0005 commented 4 years ago

Reminder to self:

fail2ban requires a timestamp in the source file to determine when the event occurred. This is needed (if I recall correctly) so that it can tell where it last processed. I don't recall if this is also so it can tell when an IP was last blocked (I believe that this state is tracked elsewhere in case the origin log file rotates, etc).

Regardless, a list of bare IPs for fail2ban to process is probably not advisable for numerous reasons, though it could be useful to sysadmins who wish to quickly remove a blocked IP. I think for that we'll need to use Ansible or become comfortable using "recipes" to unblock fail2ban-blocked IPs as needed (which is doable to begin with).

atc0005 commented 4 years ago

Finished giving a demo earlier to our team where we were given the "go ahead" to install this application on our test EZproxy server for real world testing. After I happened to search GitHub for "ezproxy" and found this project:

https://github.com/calvinm/ezproxy-abuse-checker

which has a Perl script named block_user.pl with this block close to the end of the file:

if ($block_session) {
    system("/opt/ezproxy/ezproxy kill $block_session");
}

Strange, but that looks like they're able to terminate a user session using built-in EZproxy functionality. This is the support that the OCLC Support rep told me wasn't available. I suspect the tech I spoke with honestly didn't know about the feature, and it's possible that it's not even documented well.

Will dig further.

atc0005 commented 4 years ago

Strange, but that looks like they're able to terminate a user session using built-in EZproxy functionality. This is the support that the OCLC Support rep told me wasn't available. I suspect the tech I spoke with honestly didn't know about the feature, and it's possible that it's not even documented well.

Will dig further.

OCLC Support misunderstood what I wrote back, so I took our test EZproxy instance and did some testing. The net result is that I was able to retrieve my own login session ID from two different locations:

EZPROXY_INSTALL_PATH/ezproxy.hst
- S SESSION_ID OTHER_STUFF
EZPROXY_INSTALL_PATH/audit/YYYYMMDD.txt
- Login.Success
- Login.Success.Relogin

Further testing would be needed to determine if both files can provide the session ID reliably.

I then went through the steps to confirm that I could terminate my session using the retrieved ID:

$ sudo ./ezproxy kill
Session must be specified

$ sudo  grep -E '^S ' ezproxy.hst
S SESSION_ID_HERE REDACTED

$ sudo ./ezproxy kill SESSION_ID_HERE
Session SESSION_ID_HERE terminated

SESSION_ID_HERE is a placeholder for the real session ID, which I've omitted from the example output.

atc0005 commented 4 years ago

Wrapping up initial release for v0.1.0 today/tomorrow. Going to leave this issue open for further research/testing with the goal of deciding on a "final" (as much as anything can be final) direction for the next release.

atc0005 commented 4 years ago

Wrapping up initial release for v0.1.0 today/tomorrow. Going to leave this issue open for further research/testing with the goal of deciding on a "final" (as much as anything can be final) direction for the next release.

Spun off GH-31 for that research instead of dragging this existing issue across milestones. Leaving this at the v0.1.0 milestone since the bulk of the work/notes reflects the fail2ban implementation direction.

atc0005 / brick

How can we programatically terminate a session? #13