OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 592 forks source link

Support LTPA rotation without requiring planned outage #18499

Closed NottyCode closed 10 months ago

NottyCode commented 3 years ago

Description of the high level feature, including any external spec links:

Idea: make Liberty feature parity with traditional WAS to allow LtpaToken key rotation without having to restart the Liberty profile. LtpaToken SSO key rotation in Liberty currently requires a planned outage, unlike traditional WAS which can rotate the key on the fly. The traditional WAS administrative console setting controls for automatic key re-generation filled a function in high-security sites to make mounting a session attack prohibitively costly. The feature created a new key without requiring any administrator intervention, nor restarting of WAS profiles. There is no way to re-generate the LTPA keys on-demand without a Liberty profile restart so it can be automated out-of-band of Liberty; and even that procedure is not shown in the product documentation as far as I can tell. Thus, using third-party information sources and not the product documentation, as best as I can tell the only way to implement a high-security configuration with Liberty profiles without interrupting the application is only under applications like IBM Workload Scheduler that support session persistence. We would have to architect a load-balancer-fronted multiple-master configuration that fails over to a standby, transparently carrying sessions with it, drop the target master from the load balancer rotation, shut down the profile, force the password change of the LTPA key, start up the profile which generates a new LTPA key, and then return the node to the load balancer rotation. That's a big increase in operational complexity to rotate the LTPA key with the same no-restart functionality as traditional WAS.


When complete & mandatory, add links to the UFO (Upcoming Feature Overview) document, FTS (Feature Test Summary), blogs post issues(s), and Aha (externally raised RFEs):

Instructions:

Design

Before Development Starts or 8 weeks before Onboarding

Before proceeding to any items below (active development), this feature MUST be prioritized on the backlog, and have been socialized (e.g., UFO Review). Follow the Feature and UFO Approval Process.

Development

When active development has begun

Beta

In order to facilitate early feedback from users, all new features and functionality should first be released as part of a beta release.

Beta Code

Beta Blog (Complete 1.5 weeks before beta eGA)

Legal

3 weeks before Onboarding

Translation

3 weeks before Onboarding

Feature Complete

2 weeks before Onboarding

Focal Point Approvals

2 to 1 week before Onboarding

You MUST have the Design Approved or No Design Approved label before requesting focal point approvals.

All features (both "Design Approved" and "No Design Approved")

"Design Approved" features

Ready for GA

1 week before Onboarding

1 week before GA

Other deliverbles

utle commented 1 year ago

Meeting notes:

p1: Fixed typo

p33: Added the following for container considerations:
Liberty operator will provide a better user experience
https://github.com/WASdev/websphere-liberty-operator/issues/380

p7: Added JWT in bullets two

p12: Fixed typo.

p11: Added the following bullet:
Pre-req - Config update Trigger with mbean
p33: added  the following for container considerations:
The Liberty operator will provide a better user experience  by automating the steps defined in the p11
https://github.com/WASdev/websphere-liberty-operator/issues/380

This needs further discussion to be taken offline with updates to the cloud considerations page.

p19: Added monitoryDirectory attribute so they can place the LTPA keys file in the directory

p19: 
- Used element name validationKeys.  
- Changed expirationDate to notUseAfterDate. It's optional.

p20, p21 and p22: Added description for all examples

p24: Changed attribute useContextRootAsCookiePath to useContextRootAsSSOCookiePath

p33: Updated the slide with the container considerations issue.

p35:
- Changed the error message to information message.
- Added new error message for the validationKeys passed the specified notUseAfterData. 

p29:  Added beta tag.
p22: 
- Changed useContextRootAsCookiePath to useContextRootAsSSOCookiePath
- Other comments are already addressed.
NottyCode commented 1 year ago

@utle I'm thinking about the steps a user would have to go through to get an application deployed to multiple servers updated with new LTPA keys without taking a login outage (i.e. LTPA tokens are issued that a server cannot access).

At a high level I think the steps are fairly simple, but I'm not sure of the detail and the UFO doesn't include it. I'm worried that the steps are overly burdensome. I think they would be:

  1. Call createLTPAKeys
  2. Copy LTPA keys to all servers
  3. Configure all servers to use the new LTPA key as a validation source
  4. Move primary LTPA keys to be a validation source (as well as primary)
  5. Update all servers primary LTPA key for issuing

I worry going from 4 to 5 would require significant updates to the server.xml since it would involve updating the primary and deleting an entry from the validationKeys.

Do you agree that these are the steps? If so we should document the process in the UFO, however I'm concerned that flow is quite complicated for users to script around and perhaps we need to simplify the externals here.

utle commented 1 year ago

@NottyCode We have two options Option 1: Do not enable monitoryDirectory; Just likes what you have above.

Option 2: Enable monitorDirectory; Primary and validation keys must be in the same directory.

  1. Call createLTPAKeys to create a new ltpa.keys file.
  2. Enable monitory directory for all servers.
  3. Rename the existing ltpa.keys file to validation.keys for all servers.
  4. Update all servers ltpa.keys file with the new ltpa.keys file.

Added pg22 and pg 23 in the UFO with these two options.

utle commented 1 year ago

Use Application context root for JWT/LTPA SSO cookie path: https://github.com/OpenLiberty/open-liberty/issues/25431 https://github.com/OpenLiberty/open-liberty/issues/25432

Zech-Hein commented 1 year ago

@OpenLiberty/demo-approvers Demo scheduled for EOI 23.16

utle commented 1 year ago

ID: https://github.com/OpenLiberty/docs/issues/6821

Zech-Hein commented 1 year ago

Product Code delivered: https://github.com/OpenLiberty/open-liberty/pull/25826

Zech-Hein commented 1 year ago

FAT test code delivered: https://github.com/OpenLiberty/open-liberty/pull/26017

nstewart0206 commented 1 year ago

SVT: Issue 30215

Zech-Hein commented 1 year ago

@OpenLiberty/externals-approvers - There are no API changes from this feature. Please let me know approval can be granted or if anything else is needed.

Zech-Hein commented 1 year ago

PR to remove beta guards: https://github.com/OpenLiberty/open-liberty/pull/26115

Zech-Hein commented 1 year ago

BETA Blog: https://github.com/OpenLiberty/open-liberty/issues/26138

Zech-Hein commented 1 year ago

I have validated that the translation messages were returned and the resulting PR has been merged into a driver: https://github.com/OpenLiberty/open-liberty/pull/26079

Zech-Hein commented 1 year ago

@OpenLiberty/serviceability-approvers - See the completed questions below. Please let me know if serviceability approval can be granted or if anything else is needed.

  1. UFO -- does the UFO identify the most likely problems customers will see and identify how the feature will enable them to diagnose and solve those problems without resorting to raising a PMR? Have these issues been addressed in the implementation?
  1. Test and Demo -- As part of the serviceability process we're asking feature teams to test and analyze common problem paths for serviceability and demo those problem paths to someone not involved in the development of the feature (eg. L2, test team, or another development team).

    a) What problem paths were tested and demonstrated?

    • validation keys fileName that doesn't exist
    • validation keys password that is incorrect
    • validation keys notUseAfterDate set in the past
    • validation keys notUseAfterDate set with an invalid value

    b) Who did you demo to?

    • Demo'd to the core-security-squad test team (Malhar Shah)

    c) Do the people you demo'd to agree that the serviceability of the demonstrated problem scenarios is sufficient to avoid PMRs for any problems customers are likely to encounter, or that L2 should be able to quickly address those problems without need to engage L3?

    • Yes
  2. SVT -- SVT team is often the first team to try new features and often encounters problems setting up and using them. Note that we're not expecting SVT to do full serviceability testing -- just to sign-off on the serviceability of the problem paths they encountered. a) Who conducted SVT tests for this feature?

    • Nichole Stewart

    b) Do they agree that the serviceability of the problems they encountered is sufficient to avoid PMRs, or that L2 should be able to quickly address those problems without need to engage L3?

    • Yes
  3. Which L2 / L3 queues will handle PMRs for this feature? Ensure they are present in the contact reference file and in the queue contact summary, and that the respective L2/L3 teams know they are supporting it. Ask Don Bourne if you need links or more info.

    • WAS L2: SEC
    • WAS L3: Core Security
  4. Does this feature add any new metrics or emit any new JSON events? If yes, have you updated the JMX metrics reference list / Metrics reference list / JSON log events reference list in the Open Liberty docs?

    • No
Zech-Hein commented 1 year ago

@OpenLiberty/ste-approvers the STE slides have been uploaded to the STE Box Folder

tngiang73 commented 1 year ago

@Zech-Hein : WASSEC L2 reviewed STE slides and they good good. Thanks.

chirp1 commented 1 year ago

ID has received initial content for this feature at https://github.com/OpenLiberty/docs/issues/6821 and written a draft. Approving.

Zech-Hein commented 1 year ago

UpdateTrigger being added per Alasdair's request: https://github.com/OpenLiberty/open-liberty/pull/26731

Zech-Hein commented 11 months ago

Beta guards are removed: https://github.com/OpenLiberty/open-liberty/pull/27009