[Story] Proxy requests to Thumbor to the Apex

tschaffter commented 1 year ago

What projects is this story for?

OpenChallenges

As a user, I want

NA

Description

The PR #1627 is adding an apex reverse proxy to access the app, the API gateway and other services (e.g. Zipkin) with the same root endpoint. When this apex reverse proxy is available, the requests sent to Thumbor to get images could be proxied by this reverse proxy instead of by the API gateway. This will offload the API gateway and make it more responsive. Getting an image should also be faster (depends on whether Nginx can proxy requests to Thumbor more efficiently than Spring API Gateway).

How images are currently obtained: Request => API Gateway => Thumbor
How image will be obtained with the apex reverse proxy (when available): Request => Apex => Thumbor

Note that this network architecture is OK for the private preview when the stack is deployed on a single instance. In the production deployment, the web client will send requests directly to Thumbor that will run in its own "instance" (Fargate Task).

The goal of this ticket is to proxy requests to Thumber via the apex reverse proxy to offload the API gateway.

Acceptance criteria

No response

Tasks

No response

Anything else?

No response

Have you linked this story to a GitHub Project?

[X] I have linked this story to a GitHub Project and set its metadata.

tschaffter commented 1 year ago

Questions

✅ Which of the following method returns the images faster?
- Requesting the images directly from Thumbor
- Proxying the requests to the API gateway
- Proxying the request to the Apex
✅ Does sending the requests directly to Thumbor eliminate/reduce the image timeout errors observed when proxying the images to the API gateway?
✅ Does proxying the request to the Apex eliminate/reduce the image timeout errors observed when proxying the images to the API gateway?
How much time does it take Thumbor to process an image (user needs to wait before getting the image)?
Images display much faster on the org search page the second time the page is loaded. Is it because of the browser cache or the API gateway/Apex cache?

Setup

Install ApacheBench

sudo apt update apache2-utils

The measurements are taken from the devcontainer where the OC stack is running with Docker compose. Fetching the images from the computer of the users would take more time.

Benchmark: Downloading images from Thumbor via the API Gateway

Note The value of Server Software is Thumbor when proxying to the API gateway, while it's Nginx when proxying to Apex. I double-checked this and its correct.

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8082/img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Thumbor/7.4.7
Server Hostname:        localhost
Server Port:            8082

Document Path:          /img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   26.132 seconds
Complete requests:      100
Failed requests:        1
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Non-2xx responses:      1
Total transferred:      998594 bytes
HTML transferred:       950582 bytes
Requests per second:    3.83 [#/sec] (mean)
Time per request:       261.319 [ms] (mean)
Time per request:       261.319 [ms] (mean, across all concurrent requests)
Transfer rate:          37.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   197  261  54.0    243     531
Waiting:      197  260  54.1    243     530
Total:        197  261  54.0    243     531

Percentage of the requests served within a certain time (ms)
  50%    243
  66%    265
  75%    278
  80%    285
  90%    323
  95%    365
  98%    466
  99%    531
 100%    531 (longest request)

Benchmark: Downloading images from Thumbor via the Apex reverse proxy

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8000/img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        nginx/1.25.0
Server Hostname:        localhost
Server Port:            8000

Document Path:          /img/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   22.384 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      987100 bytes
HTML transferred:       960000 bytes
Requests per second:    4.47 [#/sec] (mean)
Time per request:       223.840 [ms] (mean)
Time per request:       223.840 [ms] (mean, across all concurrent requests)
Transfer rate:          43.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:   160  224  49.8    208     493
Waiting:      160  224  49.8    208     493
Total:        160  224  49.8    208     493

Percentage of the requests served within a certain time (ms)
  50%    208
  66%    219
  75%    235
  80%    247
  90%    291
  95%    335
  98%    417
  99%    493
 100%    493 (longest request)

Benchmark: Downloading images directly from Thumbor

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8889/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Thumbor/7.4.7
Server Hostname:        localhost
Server Port:            8889

Document Path:          /v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   36.368 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      985300 bytes
HTML transferred:       960000 bytes
Requests per second:    2.75 [#/sec] (mean)
Time per request:       363.676 [ms] (mean)
Time per request:       363.676 [ms] (mean, across all concurrent requests)
Transfer rate:          26.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   186  364 211.3    247     866
Waiting:      186  363 211.2    247     865
Total:        186  364 211.3    247     867

Percentage of the requests served within a certain time (ms)
  50%    247
  66%    308
  75%    591
  80%    632
  90%    722
  95%    802
  98%    865
  99%    867
 100%    867 (longest request)

Results

Which of the following method returns the images faster?

Requesting the images directly from Thumbor

Proxying the requests to the API gateway

Proxying the request to the Apex

Proxying the image requests to Apex is slightly faster than to the API gateway. It is faster to get the images from the Apex and API gateway than directly from Thumbor. The reason is likely because both Nginx and the API gateway cache the requests.

Does sending the requests directly to Thumbor eliminate/reduce the image timeout errors observed when proxying the images to the API gateway?

Yes to some extend. I have seen one image failing to load but reloading many times the org search page with the browser can disabled successfully loaded all the org logos. Images still loads relatively slowly with blocked time reaching up to about 3 seconds.

Does proxying the request to the Apex eliminate/reduce the image timeout errors observed when proxying the images to the API gateway?

No. In the example below, the first request to a org logo fails (timeout) but getting the image manually later works.

172.22.0.1 - - [15/Jun/2023:04:34:48 +0000] "GET /img/DHytoRKUsemMy6jxAU1RsPaMpng=/140x140/logo/alzheimers-research-uk.jpg HTTP/1.1" 500 0 "http://localhost:8000/org" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0"
172.22.0.1 - - [15/Jun/2023:04:35:41 +0000] "GET /img/DHytoRKUsemMy6jxAU1RsPaMpng=/140x140/logo/alzheimers-research-uk.jpg HTTP/1.1" 200 6186 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0"

tschaffter commented 1 year ago

Nginx proxy pass does not work as I expected

My config was working for the API gateway and Zipkin by luck.

The following works because the value passed to proxy_pass + the path specified after location are valid (i.e. http://{api-gateway}/api and http://{zipkin}/zipkin).

    location /api {
      ...
      proxy_pass http://api-gateway;
    }

    location /zipkin {
      ...
      proxy_pass http://zipkin;
    }

The following does not work because http://{zipkin}/hello does not exist.

    location /hello {
      ...
      proxy_pass http://zipkin;
    }

This issue looks similar to #1642. The resources shown above are from the OC web app and shouldn't be loaded when accessing Zipkin.

tschaffter commented 1 year ago

Disabling Thumbor results storage greatly reduce the time to get images

Here using the org search pages where images are downloaded directly from Thumbor (no proxy).
Setting Thumbor results storage to RESULT_STORAGE=thumbor.result_storages.no_storage.
The first time Thumbor process images is slower than subsequent requests.
- This means that Thumbor still uses some caching.
- I would assume that caching results to the filesystem would be achieved with RESULT_STORAGE=thumbor.result_storages.file_storage.
- There is actually another form of caching: Image storage. By default original images are stored to the filesystem. This is likely what speedup the subsequent requests (Confirmed!)

The following times are when results are not cached.

vscode@0f6993f91f7e:/workspaces/sage-monorepo$ ab -n 100 http://localhost:8889/v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Thumbor/7.4.7
Server Hostname:        localhost
Server Port:            8889

Document Path:          /v7P2QbvYYxWnBLJeePXFv-1Y0UY=/140x140/logo/dfci.png
Document Length:        9600 bytes

Concurrency Level:      1
Time taken for tests:   1.763 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      985300 bytes
HTML transferred:       960000 bytes
Requests per second:    56.72 [#/sec] (mean)
Time per request:       17.630 [ms] (mean)
Time per request:       17.630 [ms] (mean, across all concurrent requests)
Transfer rate:          545.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    11   17   5.8     16      40
Waiting:       11   17   5.8     15      39
Total:         12   18   5.8     16      40

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     19
  75%     20
  80%     21
  90%     27
  95%     31
  98%     36
  99%     40
 100%     40 (longest request)

Sage-Bionetworks / sage-monorepo