Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

[Story] Proxy requests to Thumbor to the Apex #1641

Closed tschaffter closed 1 year ago

tschaffter commented 1 year ago

What projects is this story for?

OpenChallenges

As a user, I want

NA

Description

The PR #1627 is adding an apex reverse proxy to access the app, the API gateway and other services (e.g. Zipkin) with the same root endpoint. When this apex reverse proxy is available, the requests sent to Thumbor to get images could be proxied by this reverse proxy instead of by the API gateway. This will offload the API gateway and make it more responsive. Getting an image should also be faster (depends on whether Nginx can proxy requests to Thumbor more efficiently than Spring API Gateway).

Note that this network architecture is OK for the private preview when the stack is deployed on a single instance. In the production deployment, the web client will send requests directly to Thumbor that will run in its own "instance" (Fargate Task).

The goal of this ticket is to proxy requests to Thumber via the apex reverse proxy to offload the API gateway.

Acceptance criteria

No response

Tasks

No response

Anything else?

No response

Have you linked this story to a GitHub Project?

tschaffter commented 1 year ago

Questions

Setup

Install ApacheBench

sudo apt update apache2-utils

The measurements are taken from the devcontainer where the OC stack is running with Docker compose. Fetching the images from the computer of the users would take more time.

Benchmark: Downloading images from Thumbor via the API Gateway

Note The value of Server Software is Thumbor when proxying to the API gateway, while it's Nginx when proxying to Apex. I double-checked this and its correct.

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8082/img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Thumbor/7.4.7
Server Hostname:        localhost
Server Port:            8082

Document Path:          /img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   26.132 seconds
Complete requests:      100
Failed requests:        1
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Non-2xx responses:      1
Total transferred:      998594 bytes
HTML transferred:       950582 bytes
Requests per second:    3.83 [#/sec] (mean)
Time per request:       261.319 [ms] (mean)
Time per request:       261.319 [ms] (mean, across all concurrent requests)
Transfer rate:          37.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   197  261  54.0    243     531
Waiting:      197  260  54.1    243     530
Total:        197  261  54.0    243     531

Percentage of the requests served within a certain time (ms)
  50%    243
  66%    265
  75%    278
  80%    285
  90%    323
  95%    365
  98%    466
  99%    531
 100%    531 (longest request)

Benchmark: Downloading images from Thumbor via the Apex reverse proxy

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8000/img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        nginx/1.25.0
Server Hostname:        localhost
Server Port:            8000

Document Path:          /img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   22.384 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      987100 bytes
HTML transferred:       960000 bytes
Requests per second:    4.47 [#/sec] (mean)
Time per request:       223.840 [ms] (mean)
Time per request:       223.840 [ms] (mean, across all concurrent requests)
Transfer rate:          43.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:   160  224  49.8    208     493
Waiting:      160  224  49.8    208     493
Total:        160  224  49.8    208     493

Percentage of the requests served within a certain time (ms)
  50%    208
  66%    219
  75%    235
  80%    247
  90%    291
  95%    335
  98%    417
  99%    493
 100%    493 (longest request)

Benchmark: Downloading images directly from Thumbor

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8889/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Thumbor/7.4.7
Server Hostname:        localhost
Server Port:            8889

Document Path:          /v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   36.368 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      985300 bytes
HTML transferred:       960000 bytes
Requests per second:    2.75 [#/sec] (mean)
Time per request:       363.676 [ms] (mean)
Time per request:       363.676 [ms] (mean, across all concurrent requests)
Transfer rate:          26.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   186  364 211.3    247     866
Waiting:      186  363 211.2    247     865
Total:        186  364 211.3    247     867

Percentage of the requests served within a certain time (ms)
  50%    247
  66%    308
  75%    591
  80%    632
  90%    722
  95%    802
  98%    865
  99%    867
 100%    867 (longest request)

Results

Which of the following method returns the images faster?

  • Requesting the images directly from Thumbor
    • Proxying the requests to the API gateway
    • Proxying the request to the Apex

Proxying the image requests to Apex is slightly faster than to the API gateway. It is faster to get the images from the Apex and API gateway than directly from Thumbor. The reason is likely because both Nginx and the API gateway cache the requests.

Does sending the requests directly to Thumbor eliminate/reduce the image timeout errors observed when proxying the images to the API gateway?

Yes to some extend. I have seen one image failing to load but reloading many times the org search page with the browser can disabled successfully loaded all the org logos. Images still loads relatively slowly with blocked time reaching up to about 3 seconds.

Does proxying the request to the Apex eliminate/reduce the image timeout errors observed when proxying the images to the API gateway?

No. In the example below, the first request to a org logo fails (timeout) but getting the image manually later works.

172.22.0.1 - - [15/Jun/2023:04:34:48 +0000] "GET /img/DHytoRKUsemMy6jxAU1RsPaMpng=/140x140/logo/alzheimers-research-uk.jpg HTTP/1.1" 500 0 "http://localhost:8000/org" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0"
172.22.0.1 - - [15/Jun/2023:04:35:41 +0000] "GET /img/DHytoRKUsemMy6jxAU1RsPaMpng=/140x140/logo/alzheimers-research-uk.jpg HTTP/1.1" 200 6186 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0"
tschaffter commented 1 year ago

Nginx proxy pass does not work as I expected

My config was working for the API gateway and Zipkin by luck.

The following works because the value passed to proxy_pass + the path specified after location are valid (i.e. http://{api-gateway}/api and http://{zipkin}/zipkin).

    location /api {
      ...
      proxy_pass http://api-gateway;
    }

    location /zipkin {
      ...
      proxy_pass http://zipkin;
    }

The following does not work because http://{zipkin}/hello does not exist.

    location /hello {
      ...
      proxy_pass http://zipkin;
    }

image

This issue looks similar to #1642. The resources shown above are from the OC web app and shouldn't be loaded when accessing Zipkin.

tschaffter commented 1 year ago

Disabling Thumbor results storage greatly reduce the time to get images

The following times are when results are not cached.

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8889/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Thumbor/7.4.7
Server Hostname:        localhost
Server Port:            8889

Document Path:          /v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   1.763 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      985300 bytes
HTML transferred:       960000 bytes
Requests per second:    56.72 [#/sec] (mean)
Time per request:       17.630 [ms] (mean)
Time per request:       17.630 [ms] (mean, across all concurrent requests)
Transfer rate:          545.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    11   17   5.8     16      40
Waiting:       11   17   5.8     15      39
Total:         12   18   5.8     16      40

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     19
  75%     20
  80%     21
  90%     27
  95%     31
  98%     36
  99%     40
 100%     40 (longest request)