department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
96 stars 70 forks source link

SAML Module Upgrade Failure Investigation and Path Forward #14494

Closed olivereri closed 10 months ago

olivereri commented 1 year ago

Description

July 20th's SAML module upgrade ended in failure. This issue will investigate the failure and develop a path forward to an eventual successful deployment. The previous issue is referenced in "Related issues".

Acceptance Criteria

Related Issues

https://github.com/department-of-veterans-affairs/va.gov-cms/issues/14246

Implementation Details

The PIV logon failures for prod.cms.va.gov are captured in the below error log: image

This error indicates that the public certificate installed to validate signed responses from the Identity Provider are invalid. On the night of the upgrade, in coordination with the Identity and Access Management (IAM) team, the public certificate was confirmed as correct. The suspicion now is that the response was being signed by a different key that doesn't match the public certificate installed.

On July 24th a test on CMS-Test Staging was conducted by installing a valid but incorrect public certificate for signature validation: image

The error response, while having a different severity, is the same as observed on July 20th.

Team

Please check the team(s) that will do this work.

olivereri commented 1 year ago

I've reached out to the IAM team. They understandably have reservations about integrating test.staging.cms.va.gov. They had suggested creating a new integration and we can create a new server/environment. I mentioned that we have test.prod.cms.va.gov so we may go that way.

I did ask whether or not the response was signed by a different private key and they said now. Stating that: There are a total of 103 federation partners in Partnership model (which we are using for CMS) that are currently active in PROD (including current CMS integration) without any issues. I get the sense they think this problem sits chiefly on our side.

edmund-dunn commented 1 year ago

Talked to the maintainers. They concur it was most likely the cert. CleanShot2023-07-26at07.46.14.jpg

mchelen-gov commented 1 year ago

To clarify, the upgrade here refers to moving from https://www.drupal.org/project/simplesamlphp_auth to https://www.drupal.org/project/samlauth right?

olivereri commented 1 year ago

To clarify, the upgrade here refers to moving from https://www.drupal.org/project/simplesamlphp_auth to https://www.drupal.org/project/samlauth right?

@mchelen-gov That is correct.

olivereri commented 1 year ago

Through a series of e-mails the Identity and Access Management team has agreed to integrate test.prod.cms.va.gov with their Prodcution IdP. They are waiting for us to respond that we are ready and provide the metadata for the application.

olivereri commented 1 year ago

@olivereri @mchelen-gov @EWashb @teeshe @edmund-dunn @BerniXiongA6 met to discuss SAML troubleshooting efforts and possible paths forward. We came to the conclusion that SAMLAuth testing should be done to reproduce the issue or be used to learn from a working standard. This is instead of progressing with miniorange's module using SimpleSAMLphp because there are blockers due to funding lead times and a trail period shorter than we need.

Notes from the discussion: https://vfs.atlassian.net/wiki/spaces/PCMS/pages/2722988150/Blocker+Drupal+10+Upgrade+with+SAML+and+SSOi

productmike commented 1 year ago

Shifting back to READY FOR SPRINT since this is not committed work for S91. Can be pulled in if timelines are shorter than expected.

EWashb commented 11 months ago

Can we get an update on these tasks? I know we went with one SAML module over another so I'm unsure if all of these tasks are still appropriate for our path forward @teeshe @edmund-dunn cc: @BerniXiongA6

edmund-dunn commented 10 months ago

Closing this ticket. Simplesamlphp update was tested successfully on test.prod.cms.va.gov.