SAP / cloud-pysec

Cloud Container Security API for python
Apache License 2.0
10 stars 9 forks source link

OAuth sometimes failed between different microservices #36

Closed dennygee closed 2 years ago

dennygee commented 4 years ago

Hi Team,

We have a client application that encountered OAuth failure for one particular API service intermittently when we run a schedule job, let me describe.

  1. We have two individual API Service (or applications) running on CF, both binded to the same XSUAA service instance
  2. Both are using version pysec 2.0.10
  3. One of the API service has a configured SAP_JWT_TRUST_ACL environment variable for identityzone and clientid.
  4. The client application is scheduled to do this: a. Fetch a NEW OAuth Token b. Send http request to API Service 1 with the oauth token c. Send http request to API Service 2 with the (same) oauth token above d. This action is done every 30 seconds

What we observed:

  1. The scheduled jobs typically run successfully without any error
  2. However, almost each day, there will be one time API Service 2 fails with Pysec library error messages (shown below)
  3. After the failure, the next scheduled job from the client application continues to pass without errors, that's actually super strange to us.
  4. We ran this over the weekend, from Thursday to Monday morning.

These logs appeared during the failure of API Service 2:

lib name log
sap.xssec.security_context Warning: Could not validate key: Error in offline validation of access token: JWT is expired, result code 5 Will retry with configured key.
sap.xssec.security_context SSO library path: None, CCL library path: None
sap.xssec.security_context SSO library path: None, CCL library path: None
sap.xssec.key_cache Using cached verification key
sap.xssec.key_cache Loading verification key for 'jku'=https://test-client-s4-cloud.authentication.sap.hana.ondemand.com/token_keys and kid=default-jwt-key--1261874333

Questions:

  1. First of all, any idea why this might happen?
  2. The only difference between the two applications are really just one of them setting SAP_JWT_TRUST_ACL. Each endpoint does nothing but the library authentication method, and returning "OK" response. Is there some "cached" setting internally that is triggered by this variable?
  3. Are we able to disable cache? If so, what is the drawback? Or are there are env settings we can configure for the cache? Like time to live, or other settings?

We do not think this is a problem with the XSUAA service because, if it was related to the service, then we expect that both API calls would have failed. Here, we really seek your help to troubleshoot this issue. We have customers who faced this intermittent issue as well, and initially we also classified as corner case, but as we onboard more customers, we are starting to see more frequencies of the reported problem, which is why we are running a troubleshooting session now and we have to report back to the customers.

Thank you.

mhaas commented 3 years ago

@dennygee Which of the two services is setting SAP_JWT_TRUST_ACL?

Additionally, it would be meaningful to understand a bit better how the first message Error in offline validation of access token: JWT is expired, result code 5 is generated. To this end, it would be useful if your client application could log the JWT. You can then decode the JWT and look at the exp claim, which encodes the expiration date.

From the error message, it seems that your application is sending an expired JWT. It would be useful to understand if this is indeed the case.

dennygee commented 3 years ago

Hi Michael,

Thanks for the comment.

  1. That will be API Service 2 with SAP_JWT_TRUST_ACL.
  2. Not very sure can we log JWT in production environment, would that be a security...breach ( I guess)? If not, we can do so.
Eshwin commented 3 years ago

Hi Michael,

Below are the logs where we get the "token expired error"

Oct 13, 2020 @ 12:00:03.162 | ca-api-core-integration | ea740b64-747d-448d-52db-f51fe7d0f784 | DEBUG | sap.xssec.security_context | Application received a token with exp: 1602608469

Oct 13, 2020 @ 12:00:03.286 | ca-api-core-integration | 1c56a5a3-1082-4848-7440-cbdc5821a05e | WARNING | sap.xssec.security_context | Warning: Could not validate key: Error in offline validation of access token: JWT is expired, result code 5 Will retry with configured key.

so if you observe the line "Application received a token with exp: 1602608469"

here when i convert the exp time using epoch, i can still see that the token is still valid. but is says "JWT expired" , can you give you points here.

mhaas commented 3 years ago

Very interesting.

Can you enable debug logging or the sapjwt library? As per https://github.com/SAP-samples/cloud-sapjwt#error-situations you can use the following environment variables:

SAP_EXT_TRC=stdout
SAP_EXT_TRL=3
dennygee commented 3 years ago

Hi Michael,

Let me drop you an email for the logs we retrieved (as it might contains sensitive data to be display here).