cloudfoundry / cf-acceptance-tests

CF Acceptance tests
Apache License 2.0
69 stars 173 forks source link

Fresh CATS fails due to: "curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1) [HTTP2]" #1276

Open dimivel opened 6 days ago

dimivel commented 6 days ago

Issue

Currently the fresh-cats job is failing because of the Routing Transparency test that should appropriately handles certain reserved/unsafe characters but its not the case.

This is the only foundation that is enabled with http2 as default backend protocol using the GCP classical Application Load Balancer.

Context

This bbl env is enabled with http2 as backend protocol. As of today, the last 5 consecutive build resulted in red jobs because of the unit test above mentioned is failing. We see the following error in the builds:

  [FAILED] Expected
      <int>: 92
  to match exit code:
      <int>: 0

If manually, tested against this foundation using the golang app, we get the following error for curl for two reserved/unsafe characters: the '\'(backward slash) and '"'(double quotes) :

curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

Expected result

The CATS should pass and jobs should be green.

Current result

Fail to run the CATS against a fresh foundation where http2 backend protocol is enabled for a a GCP classical Application Load balancer.

Possible Fix

  1. Remove the two problematic characters (Now temporary work-around)
  2. (Not confirmed) Confirm with routing team, if there gorouter is properly handling the reserved/unsafe characters.
  3. (Partially confirmed) Issue with GCP??, open ticket on them, keep the same unit tests.
ctlong commented 6 days ago

I forwarded this to ARP WG Slack channel for their input: https://cloudfoundry.slack.com/archives/C02HNDJB31R/p1728925555563009?thread_ts=1728925551.204759&cid=C02HNDJB31R.

ctlong commented 6 days ago

What's the result of bypassing the LB as recommended by Matthew Kocher and Amelia Downs?

If the issue is at the LB level then maybe we merge #1277 as a temporary fix and raise an issue with GCP. If not, then we should engage with the routing folks to fix the issue in Gorouter.

dimivel commented 6 days ago

hey Carson,

If we force the curl to use the http1.1 it works, output

curl 'https://cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org/requesturi/!~^\"()$?!()$#!' --http1.1 -v
*   Trying 34.111.232.161:443...
* Connected to cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org (34.111.232.161) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=US; O=Cloud Foundry; CN=*.cf.luna.env.wg-ard.ci.cloudfoundry.org
*  start date: Oct  1 00:00:07 2024 GMT
*  expire date: Oct  1 00:00:07 2025 GMT
*  subjectAltName: host "cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org" matched cert's "*.cf.luna.env.wg-ard.ci.cloudfoundry.org"
*  issuer: C=USA; O=Cloud Foundry; CN=relint-ca
*  SSL certificate verify ok.
> GET /requesturi/!~^\"()$?!()$ HTTP/1.1
> Host: cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org
> User-Agent: curl/7.74.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: text/plain; charset=utf-8
< date: Tue, 15 Oct 2024 07:42:31 GMT
< x-vcap-request-id: 7a51131f-bfd1-4c9c-5ad5-0d29f127dbe3
< Content-Length: 66
< Via: 1.1 google
< Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
<
Request URI is [/requesturi/!~^\"()$?!()$]
Query String is [!()$]
* Connection #0 to host cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org left intact

It also works if we omit the protocol for the URL, default to use http see port 80:

curl 'cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org/requesturi/!~^\"()$?!()$#!' -v
*   Trying 34.111.232.161:80...
* Connected to cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org (34.111.232.161) port 80 (#0)
> GET /requesturi/!~^\"()$?!()$ HTTP/1.1
> Host: cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: text/plain; charset=utf-8
< date: Tue, 15 Oct 2024 07:44:59 GMT
< x-vcap-request-id: 7bd527d6-a8e3-43ee-55be-443c85caece0
< Content-Length: 66
< Via: 1.1 google
<
Request URI is [/requesturi/!~^\"()$?!()$]
Query String is [!()$]
* Connection #0 to host cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org left intact

Bypassing the LB works, because it uses http:

curl 10.0.1.16 -H 'Host: cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org:' -v
*   Trying 10.0.1.16:80...
* Connected to 10.0.1.16 (10.0.1.16) port 80 (#0)
> GET / HTTP/1.1
> Host: cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org:
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 10
< Content-Type: text/plain; charset=utf-8
< Date: Tue, 15 Oct 2024 07:52:43 GMT
< X-Vcap-Request-Id: 507f9706-c8e4-463f-4769-65c96c07c673
<
go, world
* Connection #0 to host 10.0.1.16 left intact
jochenehret commented 5 days ago

Involving the GCP support can be very tedious. The first level support will ask for several details, when the issue occurred for the first time, log files etc. It may take some time until the issue is handed over to the development team.

Here's a page from Google Maps that explains that common characters (like double quotes) need to be encoded: https://developers.google.com/maps/url-encoding

ctlong commented 5 days ago

I'm convinced, this seems like a GCP LB problem. Let's ignore this case in CATs then. Do you want to proceed with #1277? One other way might be to conditionally remove the two offending reserved characters if HTTP/2 is enabled 🤔

ctlong commented 5 days ago

Also, if you specify the protocol as https, Gorouter will accept HTTP/2 with the reserved characters in the Host header:

$ curl -vk https://10.0.1.16 -H 'Host: cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org/requesturi/!~^()$"?!()$#!'
*   Trying 10.0.1.16:443...
* Connected to 10.0.1.16 (10.0.1.16) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=routerSSL
*  start date: Oct 11 17:22:34 2024 GMT
*  expire date: Oct 11 17:22:34 2025 GMT
*  issuer: CN=routerCA
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x55fa0bb99eb0)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET / HTTP/2
> Host: cat6-go-test.cf.luna.env.wg-ard.ci.cloudfoundry.org/requesturi/!~^()$"?!()$#!
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 200
< content-type: text/plain; charset=utf-8
< date: Wed, 16 Oct 2024 00:49:30 GMT
< x-vcap-request-id: 93d86cfc-7303-47e9-4865-b9e9e76c71ce
< content-length: 10
<
* TLSv1.2 (IN), TLS header, Supplemental data (23):
go, world
* Connection #0 to host 10.0.1.16 left intact
ctlong commented 5 days ago

❓ Is there an established pattern for us to record a known issue with CF Deployments on GCP such that operators and developers can discover that GCP LBs do not allow certain (unencoded) reserved characters in request URLs when HTTP/2 is enabled?

cc @davewalter @ameowlia

jochenehret commented 4 days ago

As @ctlong suggested above, we could check if include_http2_routing is true (currently only for luna/fresh) and if yes, skip the double quote test in routing_transparency.go.

Nevertheless, as explained on https://developers.google.com/maps/url-encoding double-quotes are unsafe and must be percentage-encoded. I would expect that is what the GCP support would tell us if we file a support case.

jochenehret commented 3 days ago

Maybe we should also update the percent-encoded test: https://github.com/cloudfoundry/cf-acceptance-tests/blob/55b0b5c05b74d1bdd9f9bd3b7f8d4262ffb6a2b7/apps/routing_transparency.go#L38C65-L38C89

Currently it test the following list of percent-encoded characters (sorted and duplicates removed):

%20 %21 %24 %27 %28 %29 %5E %7E

Comparing this to the list from https://developers.google.com/maps/url-encoding, the following characters are missing and should be added:

%22 %23 %25 %3C %3E %7C