DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
5 stars 2 forks source link

Service responses are not compressed #6360

Open achave11-ucsc opened 2 weeks ago

achave11-ucsc commented 2 weeks ago

We implemented compression as part of #1910, but it appears to have been disabled again:

Screenshot 2024-06-25 at 1 38 08 PM

Note the accept-encoding header in the request and the absence of the content-encoding header in the response.

prod reproduction using cURL:

❯ curl -v -X 'GET' 'https://service.azul.data.humancellatlas.org/index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1' -H 'Accept: application/json' -H 'Accept-Encoding: gzip,deflate'
Note: Unnecessary use of -X or --request, GET is already inferred.
* Host service.azul.data.humancellatlas.org:443 was resolved.
* IPv6: (none)
* IPv4: 13.226.210.31, 13.226.210.81, 13.226.210.54, 13.226.210.21
*   Trying 13.226.210.31:443...
* Connected to service.azul.data.humancellatlas.org (13.226.210.31) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES128-GCM-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=service.azul.data.humancellatlas.org
*  start date: Aug 18 00:00:00 2023 GMT
*  expire date: Sep 13 23:59:59 2024 GMT
*  subjectAltName: host "service.azul.data.humancellatlas.org" matched cert's "service.azul.data.humancellatlas.org"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://service.azul.data.humancellatlas.org/index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: service.azul.data.humancellatlas.org]
* [HTTP/2] [1] [:path: /index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1]
* [HTTP/2] [1] [user-agent: curl/8.6.0]
* [HTTP/2] [1] [accept: application/json]
* [HTTP/2] [1] [accept-encoding: gzip,deflate]
> GET /index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1 HTTP/2
> Host: service.azul.data.humancellatlas.org
> User-Agent: curl/8.6.0
> Accept: application/json
> Accept-Encoding: gzip,deflate
> 
< HTTP/2 200 
< content-type: application/json
< content-length: 545808
< date: Wed, 26 Jun 2024 21:46:10 GMT
< x-amzn-requestid: 548cd4e7-f47f-485d-989d-138f1bfaae9f
< access-control-allow-origin: *
< strict-transport-security: max-age=31536000; includeSubDomains
< access-control-allow-headers: Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key
< x-frame-options: DENY
< x-amz-apigw-id: Z_rU3EPvIAMEXFw=
< cache-control: no-store
< x-content-type-options: nosniff
< x-amzn-trace-id: Root=1-667c8c1e-59e91b1c7684151721f3a857;Parent=0c49ca8a48b3f10e;Sampled=0;lineage=2808a99a:0
< x-cache: Miss from cloudfront
< via: 1.1 f7a747899149deb363c7a3968c0ed56a.cloudfront.net (CloudFront)
< x-amz-cf-pop: LAX50-C1
< x-amz-cf-id: 8yv-IAuOrbbbMrpMoJd50AQhbKWMLPom9D3wkfMy3xSP86f8U8y18Q==
< 
{"pagination" … [RESPONSE OMITTED FOR THE SAKE OF COMPACTNESS]

anvilprod reproduction using cURL:

❯ curl -v -X 'GET' 'https://service.explore.anvilproject.org/index/bundles?catalog=anvil6&filters=%7B%0A%0A%7D&size=1' -H 'Accept: application/json' -H 'Accept-Encoding: gzip,deflate'
Note: Unnecessary use of -X or --request, GET is already inferred.
* Host service.explore.anvilproject.org:443 was resolved.
* IPv6: (none)
* IPv4: 99.84.203.78, 99.84.203.102, 99.84.203.59, 99.84.203.42
*   Trying 99.84.203.78:443...
* Connected to service.explore.anvilproject.org (99.84.203.78) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES128-GCM-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=service.explore.anvilproject.org
*  start date: Feb  1 00:00:00 2024 GMT
*  expire date: Mar  2 23:59:59 2025 GMT
*  subjectAltName: host "service.explore.anvilproject.org" matched cert's "service.explore.anvilproject.org"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M02
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://service.explore.anvilproject.org/index/bundles?catalog=anvil6&filters=%7B%0A%0A%7D&size=1
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: service.explore.anvilproject.org]
* [HTTP/2] [1] [:path: /index/bundles?catalog=anvil6&filters=%7B%0A%0A%7D&size=1]
* [HTTP/2] [1] [user-agent: curl/8.6.0]
* [HTTP/2] [1] [accept: application/json]
* [HTTP/2] [1] [accept-encoding: gzip,deflate]
> GET /index/bundles?catalog=anvil6&filters=%7B%0A%0A%7D&size=1 HTTP/2
> Host: service.explore.anvilproject.org
> User-Agent: curl/8.6.0
> Accept: application/json
> Accept-Encoding: gzip,deflate
> 
< HTTP/2 200 
< content-type: application/json
< content-length: 5359
< date: Wed, 26 Jun 2024 21:50:52 GMT
< x-amzn-requestid: bb3dcccf-14c5-42c7-93ce-3c744db98755
< access-control-allow-origin: *
< strict-transport-security: max-age=31536000; includeSubDomains
< access-control-allow-headers: Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key
< x-frame-options: DENY
< x-amz-apigw-id: Z_sBKES-oAMEbdw=
< cache-control: no-store
< x-content-type-options: nosniff
< x-amzn-trace-id: Root=1-667c8d3a-783ea0c273a86cb45d2afd5b;Parent=08ed5a627d5f9b6c;Sampled=0;lineage=45061563:0
< x-cache: Miss from cloudfront
< via: 1.1 96abbf138436a1c4a82006a53fa43b20.cloudfront.net (CloudFront)
< x-amz-cf-pop: LAX3-C3
< x-amz-cf-id: ey9ZV6nXDL96Aunbo8_2ZHWJlIo-LpJqZBkHzpczfZWLQ2ssC6lCbQ==
< 
{"hits": … [RESPONSE OMITTED FOR THE SAKE OF COMPACTNESS]
dsotirho-ucsc commented 2 weeks ago

Spike to add reproduction involving curl for both anvilprod and prod. Remember to include gzip in the accept-encoding request header (if it is not already included).

achave11-ucsc commented 2 weeks ago

Ticket description has been updated with the reproductions for prod and anvilprod using cURL.

dsotirho-ucsc commented 2 weeks ago

Assignee to consider next steps.

hannes-ucsc commented 1 week ago

I checked, minimum_compression_size is 0 so all responses should be compressed. Assignee to file support ticket with AWS asking why responses are not compressed. Provide @achave11-ucsc's curl repro and a screenshot of the settings of the API used in that repro, showing that minimum_compression_size is indeed 0.

dsotirho-ucsc commented 1 week ago

Created AWS Support ticket 172005082100601

dsotirho-ucsc commented 6 days ago

As suggested by AWS Support, a Deploy API action was performed, after which the a compressed response can now be produced.

# azul-trail-prod

fields @timestamp,
  eventName,
  coalesce(responseElements.authorizerById.restApiId, responseElements.stageFlushAuthorizerCache.restApiId, requestParameters.restApiId) as restApiId,
  coalesce(responseElements.deploymentId, responseElements.deploymentStages.deploymentId) as deployment_id,
  userIdentity.arn
| filter eventSource = '[apigateway.amazonaws.com](http://apigateway.amazonaws.com/)'
| filter restApiId = 'ccp122p730'
| filter eventName like /CreateStage|UpdateStage|UpdateRestApi|CreateDeployment/
| sort @timestamp asc
| limit 200

Screenshot 2024-07-05 at 5 04 20 PM

❯ curl -v -X 'GET' 'https://service.azul.data.humancellatlas.org/index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1' -H 'Accept: applicationjson' -H 'Accept-Encoding: gzip,deflate'
Note: Unnecessary use of -X or --request, GET is already inferred.
* Host service.azul.data.humancellatlas.org:443 was resolved.
* IPv6: (none)
* IPv4: 18.155.202.127, 18.155.202.76, 18.155.202.98, 18.155.202.89
*   Trying 18.155.202.127:443...
* Connected to service.azul.data.humancellatlas.org (18.155.202.127) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES128-GCM-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=service.azul.data.humancellatlas.org
*  start date: Aug 18 00:00:00 2023 GMT
*  expire date: Sep 13 23:59:59 2024 GMT
*  subjectAltName: host "service.azul.data.humancellatlas.org" matched cert's "service.azul.data.humancellatlas.org"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://service.azul.data.humancellatlas.org/index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: service.azul.data.humancellatlas.org]
* [HTTP/2] [1] [:path: /index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1]
* [HTTP/2] [1] [user-agent: curl/8.6.0]
* [HTTP/2] [1] [accept: application/json]
* [HTTP/2] [1] [accept-encoding: gzip,deflate]
> GET /index/bundles?catalog=dcp39&filters=%7B%0A%0A%7D&size=1 HTTP/2
> Host: service.azul.data.humancellatlas.org
> User-Agent: curl/8.6.0
> Accept: application/json
> Accept-Encoding: gzip,deflate
>
< HTTP/2 200
< content-type: application/json
< content-length: 122296
< date: Mon, 08 Jul 2024 18:15:00 GMT
< x-amzn-requestid: f020f844-bb2b-4697-90a9-9127f6d7065f
< access-control-allow-origin: *
< content-encoding: gzip
< strict-transport-security: max-age=31536000; includeSubDomains
< access-control-allow-headers: Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key
< x-frame-options: DENY
< x-amz-apigw-id: amvpUFxDIAMEPEw=
< cache-control: no-store
< x-content-type-options: nosniff
< x-amzn-trace-id: Root=1-668c2ca1-4a3c204b4a064e470bb1a99a;Parent=3ac16882945f25d4;Sampled=0;lineage=2808a99a:0
< x-cache: Miss from cloudfront
< via: 1.1 b9123be426d0e732cf10eff602d871c8.cloudfront.net (CloudFront)
< x-amz-cf-pop: SFO53-P2
< x-amz-cf-id: CkSVrdY-_uShFvWB1OoDILUFmjfu4hK_rYRqcFwYs6EIFQHXI4oqZw==
<
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
* Failure writing output to destination
* Connection #0 to host service.azul.data.humancellatlas.org left intact
dsotirho-ucsc commented 6 days ago

Assignee to resolve issue with AWS Support.

dsotirho-ucsc commented 6 days ago

AWS Support case 172005082100601 resolved with positive feedback.

hannes-ucsc commented 6 days ago

The following is a visualization of the relationships between the TF resources involved. Unlabeled arrows represent dependencies expressed via ${…}. The diagram anticipates the upcoming changes for #6284. The green part is what I hope will fix this issue.

API Gateway TF dependencies

Assignee to verify that, on develop, a change to the minimum_compression_size property of the rest_api resource does not yield a plan that replaces the deployment resource. Assignee to implement the fix and verify that, with the fix in place, such a change would replace the deployment resource.