firebase / firebase-admin-go

Firebase Admin Go SDK
Apache License 2.0
1.12k stars 239 forks source link

VerifyIDTokenandCheckRevoked returning error: `Could not find expiry time from HTTP headers` #621

Closed timelery closed 1 month ago

timelery commented 1 month ago

Environment Firebase SDK version: v4.14.0 Firebase Product: auth

Describe the problem I am receiving an error that states Could not find expiry time from HTTP headers when executing VerifyIDTokenandCheckRevoked function.

This error first started occuring this morning during token validation. Looking at the source code in the module this error seems to originate in the token-verifier go file inside the findMaxAge method. I have verified the token I am passing to this function is valid.

othonrm commented 1 month ago

Same happening here.

mstanleyjr commented 1 month ago

Seeing the same for VerifyIdToken

Edit: Same version v4.14.0. We rolled back and saw it on v4.13.0

timelery commented 1 month ago

I also have new client errors coming in during phone auth against the IOS Firebase verifyPhoneNumber method. Error: "INVALID_APP_CREDENTIAL". These errors started occuring this morning as well.

jschaf commented 1 month ago

We're still figuring out the details, but we had a multi-hour outage with this error message. We're not quite sure how things broke.

The band-aid fix was https://github.com/arryved/firebase-admin-go/pull/1 and using the replace directive in go.mod to point to our fork. That code hasn't changed in five years, so I'm not sure what suddenly caused the breakage.

// go.mod
replace firebase.google.com/go/v4 => github.com/arryved/firebase-admin-go/v4 v4.0.0-20240517153600-191d3ba33c12
timelery commented 1 month ago

It looks like the issue resides in the backend Firebase API servers. The http call that go is making for token validation is no longer accepted. I tried multiple versions with the same outcome.

armando1793 commented 1 month ago

We noticed that this has been happening only in our Cloud Run instances hosted in the us-west-2 region

We first noticed symptoms of this issue around March 17, 4am GMT+8 when we were trying to refactor some of our usage of the firebase auth go sdk. We attributed the symptoms to dev error because when we would route traffic to our staging cloud run instances to an older version the issue would disappear.

We noticed the issue again on March 17, 11am GMT+8 happening to an unrelated feature from the one I mentioned above. By around March 17, 3pm GMT+8 we noticed that the issue was happening on our cloud run instances in production despite no new revisions being deployed for the last several days. This was when we started looking into the problem as a firebase issue.

As of March 17, between 6-7pm GMT+8, we were able to use the SDK to validate client tokens via our local machines in the Philippines. But when we would try the same function inside our cloud run instances, the code would fail. Client tokens would not be validated inside the cloud run instance but would be perfectly ok on our local machines

Our findings:

When the SDK calls the URL:

https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com

in the Philippines the headers are

"Cache-Control": [ "public, max-age=24584, must-revalidate, no-transform" ]

But inside our Cloud Run instance it is

"Cache-Control": [ "private" ]

As of writing, May 18, 1:27AM GMT+8, this is still the case. Tokens can still be validated from our local machines. But not in our cloud run instances.

jschaf commented 1 month ago

We noticed that this has been happening only in our Cloud Run instances hosted in the us-west-2 region

We observed this on GKE nodes in us-west-4 with the GCP load balancer in front. Both happening us-west is suspicious.

timelery commented 1 month ago

I submitted a formal firebase bug report. If any of you are aware of another way to notify the firebase team please let me know. I suspect this issue is affecting many others.

myxomatos commented 1 month ago

I narrowed the problem and made a fix by forking and modifying Firebase client code. I found that Go Firebase client code, including the very latest version (v4.14.0), relies on "cache-control" response header value returned by an HTTP call for public certificates. This call is invoked by the client code to verify ID tokens. Specifically, it uses "max-age" section of the header to calculate certificate expiration time. And on May 16 at 5:45pm, the header value changed to "private", breaking Firebase client code written in Go. (I'm not sure about client code written in other languages.)

More details: Firebase Auth client code fetches certs from https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com

This command can be used to get value of the "cache-control" header: curl -v "https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com" 2>&1 | grep "cache-control"

My fix is to return a default expiration value instead of an error: https://github.com/dutchpet/firebase-admin-go/commit/5d4d7d0fa7c0b6c9302ce11bc53b2302a94b1432

georgi0u commented 1 month ago

@myxomatos what's the status of the fix?

Are you suggesting affected clients hot-patch the existing package? Or is the main-branch fix going out soon?

Also, I'm imagining this header change — seeing as it's partially experienced — is an ongoing incremental rollout by whoever's in charge of that cert URI. Is there any luck on coordinating with them to not break users of this client?

janaaronlee commented 1 month ago

@georgi0u what @armando1793 and I did to work around the issue was pretty much exactly what @myxomatos did. It is definitely a bug on the Golang Firebase Admin SDK as a reasonable fallback should have been in place instead of nil.

func findMaxAge(resp *http.Response) (*time.Duration, error) {
    cc := resp.Header.Get("cache-control")
    for _, value := range strings.Split(cc, ",") {
        value = strings.TrimSpace(value)
        if strings.HasPrefix(value, "max-age=") {
            sep := strings.Index(value, "=")
            seconds, err := strconv.ParseInt(value[sep+1:], 10, 64)
            if err != nil {
                return nil, err
            }
            duration := time.Duration(seconds) * time.Second
            return &duration, nil
        }
    }
    return nil, errors.New("Could not find expiry time from HTTP headers")
}

For reference: https://github.com/firebase/firebase-admin-go/blob/87b867c2ac93c0c7ebad1f2eced98c37fcf76307/auth/token_verifier.go#L493

georgi0u commented 1 month ago

Yup, appreciate the direction. I've patched a fork as well, and rebuilt/redeployed using that.

Now, I'm curious what the plan for the official package is. And also if there's a plan to not break other unpatched clients, by coordinating within Google.

josephjoeljo commented 1 month ago

We're still figuring out the details, but we had a multi-hour outage with this error message. We're not quite sure how things broke.

The band-aid fix was arryved#1 and using the replace directive in go.mod to point to our fork. That code hasn't changed in five years, so I'm not sure what suddenly caused the breakage.

// go.mod
replace firebase.google.com/go/v4 => github.com/arryved/firebase-admin-go/v4 v4.0.0-20240517153600-191d3ba33c12

using that fork temporarily as well. Thank you.

JairoPanduro commented 1 month ago

+1 here

ribrdb commented 1 month ago

Google Cloud Support says the production issue is fixed, although it seems like a fix here would still be good to prevent this from reoccurring.

jschaf commented 1 month ago

Google Cloud Support says the production issue is fixed, although it seems like a fix here would still be good to prevent this from reoccurring.

Does anyone know how to verify? I'd rather not tempt another outage.

ribrdb commented 1 month ago

I think you could start a gce micro instance in whatever region your app is running and run the curl command from above:

curl -v "https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com" 2>&1 | grep "cache-control"

You want to see public and max-age, not just 'private'

armando1793 commented 1 month ago

Google Cloud Support says the production issue is fixed, although it seems like a fix here would still be good to prevent this from reoccurring.

Does anyone know how to verify? I'd rather not tempt another outage.

I just did what @ribrdb suggested on the staging version of our Cloud Run instance that was affected by the outage. I can confirm that the headers are available. I will attempt to repoint our package back to the official SDK in a few hours and see if the issue is resolved. Will update here when I do

otakakot commented 1 month ago

Why does it generate an error if max-age cannot be obtained? I think this value will be used later to determine if it should be refreshed or not. So why not return 0 if the value cannot be retrieved so that it is not cached?

Also, I checked the implementation of the node library (firebase-admin-node), and it seems that if the max-age value could not be obtained, an error is not generated, but the default value of 0 is set (i.e., not cached).

lahirumaramba commented 1 month ago

Hey folks, the backend issue should be addressed now. I agree with the comments above, we should update the SDK to handle this case gracefully without throwing and continue the token verification. We will submit a fix soon and this issue will track the progress. Thanks!

lahirumaramba commented 1 month ago

Addressed in #623

jschaf commented 1 month ago

Thank you. Are you able to cut a new release so we can upgrade without using the dev branch?

lahirumaramba commented 1 month ago

Thank you. Are you able to cut a new release so we can upgrade without using the dev branch?

Hey @jschaf , we will cut a new release this week. Thanks