MartinKavik / addon_proxy

1 stars 1 forks source link

Specifications #1

Open MartinKavik opened 4 years ago

MartinKavik commented 4 years ago

Here's the basic idea: usually, when hosting a Stremio addon (which is normally implemented in NodeJS), we recommend putting NGINX on top since it reduces the load by caching most responses. The Stremio addon system is designed in such a way that all responses are cacheable. However, we need a few domain-specific features, and so we've decided we need a domain-specific HTTP reverse proxy.

Here's how it should work, v1:

HTTP reverse proxy based on tokio
Reads config from a yaml/json config file; this config specifies multiple entries of {server, origin, noValidation?}
    whether it's yaml or json is up to you
    a pair of ["my-domain.com", "http://localhost:8080"] would mean that an addon request like https://my-domain.com/manifest.json will be resolved by proxying http://localhost:8080
    the name "server" was chosen because NGINX uses it; if you can think of better names, go for it
    the server property should support hostname and path (e.g. "my-domain.com" or "sub.domain.com" or "sub.domain.com/my-addon"), while the origin property will be a full URI; e.g. if the config is {"sub.domain.com/my-addon", "http://localhost:9999"}, then "sub.domain.com/my-addon/manifest.json" will go to "http://localhost:9999/manifest.json"
    for simplicity, route to the first matched; e.g. if you have both "sub.domain.com/my-addon" and "sub.domain.com", they should have to be in that order (more specific one first)
    the config should be reloaded upon a certain process signal (like NGINX) or periodically or upon change, depending on which one is easiest to implement
Request validation: incoming requests should be validated using ResourceRequest (by attempting to stringify them); if it's not a valid request, do not even try to proxy it (return bad request), unless noValidation is specified; remember to also allow /manifest.json
Caching: the proxy must cache responses by respecting the HTTP cache headers that come from the origin
    domain specific: if no cache headers are returned at all, assume 10 minutes cache validity
    no need to handle all HTTP cache header specs: just `cache-control` `max-age`
    a DB must be selected for this; ideally, like NGINX, it will use an embedded DB; most addon responses are under 4KB
    stale cache responses should not be cleared at all
    consider capping the max response size to 2MB
Domain specific: "Always on" functionality: if the addon origin is failing for some reason (returning non-200, timing out), return the last cached response, even if it's stale*
    we need configurable timeouts
    * - there should be a configurable "stale threshold": e.g., only do this if the cached response is not older than 48 hours
If the incoming request does not match any of the server entries in the config, return 200 + some html (include_bytes from landing.html)

and v2:

Response validation
Optional TLS (HTTPS)
smart "always on": if an addon returns no streams (error or simply an empty response), return the streams from the cached response, but only if certain conditions are met (all streams are p2p streams: they have `infoHash` or their `url` begins with `magnet:`)

For now, we only need v1.

It's worth noting that a lot of these things (like the default cache, or always on) may be doable with NGINX but we still figured we'd do a domain-specific proxy, because of the validation and because we expect tokio's performance here to be better than NGINX.

Here's some repos that may help:

https://github.com/stremio/opensubtitles-proxy - tokio-based (0.1) proxy, with no caching
https://github.com/AdExNetwork/adex-supermarket - tokio 0.2
you can find hello world addons here: https://github.com/stremio/stremio-addon-sdk
MartinKavik commented 4 years ago

@Ivshti

Thanks!

Ivshti commented 4 years ago

re questions

  1. yes, please do; as for static files, perhaps we can allow everything in public and images
  2. unfortunately we don't have a collectin or recommendations, but you can peak into the stremio-addon-sdk JS integration tests to see what we test for
  3. yes - /manifest.json is about 10-20% of requests; then the majority of traffic goes to: if it's a catalog addon, then the first catalog (e.g. /catalog/movie/top.json); if it's a stream addon, then it's all over various /stream/ endpoints
MartinKavik commented 4 years ago

Questions:

If the incoming request does not match any of the server entries in the config, return 200 + some html (include_bytes from landing.html)

Changes I focused on benchmarks - you can see results on my machine below. As a "side-efffect" you can start & stop proxy programmatically now and I've fixed some performance and resources issues along the way. Also HTTP mock server is ready for writing proxy tests and you can lint and format the code with cargo make verify. README.md has been updated.

Next steps I didn't find any reasonable data to compare this proxy with nginx but according the results below it seems that the performance should be enough (what do you think?). So I think I can move to the final step before integration - writing tests and doing basic testing through Stremio apps and Heroku.


The results below has been created by benchmarks in proxy_benchmark.rs

Benchmark results ``` $ cargo bench Compiling addon_proxy v0.1.0 (C:\work\repos\addon_proxy_1) Finished bench [optimized] target(s) in 11.24s Running target\release\deps\addon_proxy-b81646b3e0debb8f.exe running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target\release\deps\addon_proxy-13f2810ce826293d.exe running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target\release\deps\proxy_benchmark-9738f69bbfb586bd.exe Gnuplot not found, using plotters backend bench_data/proxy_db removed. Listening on http://0.0.0.0:5000 status time: [52.459 ms 52.915 ms 53.655 ms] change: [-1.4005% +0.4297% +2.0818%] (p = 0.66 > 0.05) No change in performance detected. _______________________________________________________ Bench name ............................... status Number of all requests per iteration...... 1,000 Number of users .......................... 1 Send request & read response avg time .... 52.923µs Requests & readings per second ........... 18,864 Number of all requests ................... 141,000 Bench time ............................... 7.4746894s Path ..................................... /status _______________________________________________________ Benchmarking status_parallel: Warming up for 1.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.2s. status_parallel time: [136.94 ms 137.34 ms 138.19 ms] change: [-3.0949% -0.7467% +2.0147%] (p = 0.59 > 0.05) No change in performance detected. Found 2 outliers among 10 measurements (20.00%) 1 (10.00%) high mild 1 (10.00%) high severe _______________________________________________________ Bench name ............................... status_parallel Number of all requests per iteration...... 10,000 Number of users .......................... 100 Send request & read response avg time .... 1.294368ms Requests & readings per second ........... 71,864 Number of all requests ................... 620,000 Bench time ............................... 8.6275936s Path ..................................... /status _______________________________________________________ manifest | no_cache time: [68.005 ms 68.166 ms 68.444 ms] change: [+0.2862% +1.4445% +2.7892%] (p = 0.04 < 0.05) Change within noise threshold. Found 2 outliers among 10 measurements (20.00%) 2 (20.00%) high severe _______________________________________________________ Bench name ............................... manifest | no_cache Number of all requests per iteration...... 100 Number of users .......................... 1 Send request & read response avg time .... 684.257µs Requests & readings per second ........... 1,461 Number of all requests ................... 12,500 Bench time ............................... 8.5545922s Path ..................................... /origin/manifest.json _______________________________________________________ Benchmarking manifest_parallel | no_cache: Warming up for 1.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 9.8s. manifest_parallel | no_cache time: [175.33 ms 178.21 ms 180.53 ms] change: [-4.4733% -0.7151% +2.7762%] (p = 0.74 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild _______________________________________________________ Bench name ............................... manifest_parallel | no_cache Number of all requests per iteration...... 1,000 Number of users .......................... 100 Send request & read response avg time .... 17.17478ms Requests & readings per second ........... 5,605 Number of all requests ................... 62,000 Bench time ............................... 11.0605004s Path ..................................... /origin/manifest.json _______________________________________________________ top | no_cache time: [67.936 ms 68.208 ms 68.579 ms] change: [-1.3693% -0.3098% +0.7069%] (p = 0.60 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild _______________________________________________________ Bench name ............................... top | no_cache Number of all requests per iteration...... 100 Number of users .......................... 1 Send request & read response avg time .... 683.708µs Requests & readings per second ........... 1,462 Number of all requests ................... 12,500 Bench time ............................... 8.5477628s Path ..................................... /origin/catalog/movie/top.json _______________________________________________________ Benchmarking top_parallel | no_cache: Warming up for 1.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 10.3s. top_parallel | no_cache time: [182.41 ms 187.52 ms 191.90 ms] change: [-0.0084% +3.7576% +7.3743%] (p = 0.08 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild _______________________________________________________ Bench name ............................... top_parallel | no_cache Number of all requests per iteration...... 1,000 Number of users .......................... 100 Send request & read response avg time .... 17.992426ms Requests & readings per second ........... 5,350 Number of all requests ................... 62,000 Bench time ............................... 11.588671s Path ..................................... /origin/catalog/movie/top.json _______________________________________________________ Listening on http://0.0.0.0:5000 manifest time: [5.9461 ms 5.9931 ms 6.0410 ms] change: [-5.4084% -4.6677% -3.8741%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild _______________________________________________________ Bench name ............................... manifest Number of all requests per iteration...... 100 Number of users .......................... 1 Send request & read response avg time .... 59.819µs Requests & readings per second ........... 16,690 Number of all requests ................... 113,500 Bench time ............................... 6.8002705s Path ..................................... /origin/manifest.json _______________________________________________________ manifest_parallel time: [16.363 ms 16.420 ms 16.534 ms] change: [-1.9603% -0.2162% +1.4894%] (p = 0.83 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild _______________________________________________________ Bench name ............................... manifest_parallel Number of all requests per iteration...... 1,000 Number of users .......................... 100 Send request & read response avg time .... 1.514527ms Requests & readings per second ........... 60,360 Number of all requests ................... 393,000 Bench time ............................... 6.5112808s Path ..................................... /origin/manifest.json _______________________________________________________ top time: [6.0674 ms 6.0833 ms 6.1016 ms] change: [-7.0570% -5.8751% -4.8091%] (p = 0.00 < 0.05) Performance has improved. _______________________________________________________ Bench name ............................... top Number of all requests per iteration...... 100 Number of users .......................... 1 Send request & read response avg time .... 61.002µs Requests & readings per second ........... 16,367 Number of all requests ................... 108,000 Bench time ............................... 6.5984678s Path ..................................... /origin/catalog/movie/top.json _______________________________________________________ top_parallel time: [17.468 ms 17.597 ms 17.803 ms] change: [-5.8413% -3.2756% -0.5978%] (p = 0.03 < 0.05) Change within noise threshold. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild _______________________________________________________ Bench name ............................... top_parallel Number of all requests per iteration...... 1,000 Number of users .......................... 100 Send request & read response avg time .... 1.626625ms Requests & readings per second ........... 56,366 Number of all requests ................... 338,000 Bench time ............................... 5.9970978s Path ..................................... /origin/catalog/movie/top.json _______________________________________________________ Benchmarking manifest_parallel_long: Warming up for 1.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 862.0s. manifest_parallel_long time: [15.533 s 15.564 s 15.593 s] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild _______________________________________________________ Bench name ............................... manifest_parallel_long Number of all requests per iteration...... 1,000,000 Number of users .......................... 1,000 Send request & read response avg time .... 8.626082ms Requests & readings per second ........... 64,250 Number of all requests ................... 56,000,000 Bench time ............................... 871.6348006s Path ..................................... /origin/manifest.json _______________________________________________________ ```
Ivshti commented 4 years ago

Hey Martin,

Performance seems excellent from this! Great job!

Re your question "Why we want to return 200 instead of 4xx or 5xx? It's easy to write wrong routes or url in requests and then it's very possible that your tests/benchmarks/clients/monitoring tools don't notice it." - good point. The root (/) should be landing.html and everything else should return 404, I'm not sure why I wrote that, so it was probably a mistake.

On Tue, 26 May 2020 at 20:19, Martin Kavík notifications@github.com wrote:

Questions:

If the incoming request does not match any of the server entries in the config, return 200 + some html (include_bytes from landing.html)

  • Why we want to return 200 instead of 4xx or 5xx? It's easy to write wrong routes or url in requests and then it's very possible that your tests/benchmarks/clients/monitoring tools don't notice it.

Changes I focused on benchmarks - you can see results on my machine below. As a "side-efffect" you can start & stop proxy programmatically now and I've fixed some performance and resources issues along the way. Also HTTP mock server is ready for writing proxy tests and you can lint and format the code with cargo make verify. README.md has been updated.

Next steps I didn't find any reasonable data to compare this proxy with nginx but according the results below it seems that the performance should be enough (what do you think?). So I think I can move to the final step before integration - writing tests and doing basic testing through Stremio apps and Heroku.

The results below has been created by benchmarks in proxy_benchmark.rs https://github.com/MartinKavik/addon_proxy_1/blob/0a9f01d074f8778198663206329d3b5136bc12ae/benches/proxy_benchmark.rs#L42-L71 Benchmark results

$ cargo bench

Compiling addon_proxy v0.1.0 (C:\work\repos\addon_proxy_1)

Finished bench [optimized] target(s) in 11.24s

 Running target\release\deps\addon_proxy-b81646b3e0debb8f.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

 Running target\release\deps\addon_proxy-13f2810ce826293d.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

 Running target\release\deps\proxy_benchmark-9738f69bbfb586bd.exe

Gnuplot not found, using plotters backend

bench_data/proxy_db removed.

Listening on http://0.0.0.0:5000

status time: [52.459 ms 52.915 ms 53.655 ms]

                    change: [-1.4005% +0.4297% +2.0818%] (p = 0.66 > 0.05)

                    No change in performance detected.

Bench name ............................... status

Number of all requests per iteration...... 1,000

Number of users .......................... 1

Send request & read response avg time .... 52.923µs

Requests & readings per second ........... 18,864

Number of all requests ................... 141,000

Bench time ............................... 7.4746894s

Path ..................................... /status


Benchmarking status_parallel: Warming up for 1.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.2s.

status_parallel time: [136.94 ms 137.34 ms 138.19 ms]

                    change: [-3.0949% -0.7467% +2.0147%] (p = 0.59 > 0.05)

                    No change in performance detected.

Found 2 outliers among 10 measurements (20.00%)

1 (10.00%) high mild

1 (10.00%) high severe


Bench name ............................... status_parallel

Number of all requests per iteration...... 10,000

Number of users .......................... 100

Send request & read response avg time .... 1.294368ms

Requests & readings per second ........... 71,864

Number of all requests ................... 620,000

Bench time ............................... 8.6275936s

Path ..................................... /status


manifest | no_cache time: [68.005 ms 68.166 ms 68.444 ms]

                    change: [+0.2862% +1.4445% +2.7892%] (p = 0.04 < 0.05)

                    Change within noise threshold.

Found 2 outliers among 10 measurements (20.00%)

2 (20.00%) high severe


Bench name ............................... manifest | no_cache

Number of all requests per iteration...... 100

Number of users .......................... 1

Send request & read response avg time .... 684.257µs

Requests & readings per second ........... 1,461

Number of all requests ................... 12,500

Bench time ............................... 8.5545922s

Path ..................................... /origin/manifest.json


Benchmarking manifest_parallel | no_cache: Warming up for 1.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 9.8s.

manifest_parallel | no_cache

                    time:   [175.33 ms 178.21 ms 180.53 ms]

                    change: [-4.4733% -0.7151% +2.7762%] (p = 0.74 > 0.05)

                    No change in performance detected.

Found 1 outliers among 10 measurements (10.00%)

1 (10.00%) high mild


Bench name ............................... manifest_parallel | no_cache

Number of all requests per iteration...... 1,000

Number of users .......................... 100

Send request & read response avg time .... 17.17478ms

Requests & readings per second ........... 5,605

Number of all requests ................... 62,000

Bench time ............................... 11.0605004s

Path ..................................... /origin/manifest.json


top | no_cache time: [67.936 ms 68.208 ms 68.579 ms]

                    change: [-1.3693% -0.3098% +0.7069%] (p = 0.60 > 0.05)

                    No change in performance detected.

Found 1 outliers among 10 measurements (10.00%)

1 (10.00%) high mild


Bench name ............................... top | no_cache

Number of all requests per iteration...... 100

Number of users .......................... 1

Send request & read response avg time .... 683.708µs

Requests & readings per second ........... 1,462

Number of all requests ................... 12,500

Bench time ............................... 8.5477628s

Path ..................................... /origin/catalog/movie/top.json


Benchmarking top_parallel | no_cache: Warming up for 1.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 10.3s.

top_parallel | no_cache time: [182.41 ms 187.52 ms 191.90 ms]

                    change: [-0.0084% +3.7576% +7.3743%] (p = 0.08 > 0.05)

                    No change in performance detected.

Found 1 outliers among 10 measurements (10.00%)

1 (10.00%) high mild


Bench name ............................... top_parallel | no_cache

Number of all requests per iteration...... 1,000

Number of users .......................... 100

Send request & read response avg time .... 17.992426ms

Requests & readings per second ........... 5,350

Number of all requests ................... 62,000

Bench time ............................... 11.588671s

Path ..................................... /origin/catalog/movie/top.json


Listening on http://0.0.0.0:5000

manifest time: [5.9461 ms 5.9931 ms 6.0410 ms]

                    change: [-5.4084% -4.6677% -3.8741%] (p = 0.00 < 0.05)

                    Performance has improved.

Found 1 outliers among 10 measurements (10.00%)

1 (10.00%) high mild


Bench name ............................... manifest

Number of all requests per iteration...... 100

Number of users .......................... 1

Send request & read response avg time .... 59.819µs

Requests & readings per second ........... 16,690

Number of all requests ................... 113,500

Bench time ............................... 6.8002705s

Path ..................................... /origin/manifest.json


manifest_parallel time: [16.363 ms 16.420 ms 16.534 ms]

                    change: [-1.9603% -0.2162% +1.4894%] (p = 0.83 > 0.05)

                    No change in performance detected.

Found 1 outliers among 10 measurements (10.00%)

1 (10.00%) high mild


Bench name ............................... manifest_parallel

Number of all requests per iteration...... 1,000

Number of users .......................... 100

Send request & read response avg time .... 1.514527ms

Requests & readings per second ........... 60,360

Number of all requests ................... 393,000

Bench time ............................... 6.5112808s

Path ..................................... /origin/manifest.json


top time: [6.0674 ms 6.0833 ms 6.1016 ms]

                    change: [-7.0570% -5.8751% -4.8091%] (p = 0.00 < 0.05)

                    Performance has improved.

Bench name ............................... top

Number of all requests per iteration...... 100

Number of users .......................... 1

Send request & read response avg time .... 61.002µs

Requests & readings per second ........... 16,367

Number of all requests ................... 108,000

Bench time ............................... 6.5984678s

Path ..................................... /origin/catalog/movie/top.json


top_parallel time: [17.468 ms 17.597 ms 17.803 ms]

                    change: [-5.8413% -3.2756% -0.5978%] (p = 0.03 < 0.05)

                    Change within noise threshold.

Found 1 outliers among 10 measurements (10.00%)

1 (10.00%) high mild


Bench name ............................... top_parallel

Number of all requests per iteration...... 1,000

Number of users .......................... 100

Send request & read response avg time .... 1.626625ms

Requests & readings per second ........... 56,366

Number of all requests ................... 338,000

Bench time ............................... 5.9970978s

Path ..................................... /origin/catalog/movie/top.json


Benchmarking manifest_parallel_long: Warming up for 1.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 862.0s.

manifest_parallel_long time: [15.533 s 15.564 s 15.593 s]

Found 1 outliers among 10 measurements (10.00%)

1 (10.00%) high mild


Bench name ............................... manifest_parallel_long

Number of all requests per iteration...... 1,000,000

Number of users .......................... 1,000

Send request & read response avg time .... 8.626082ms

Requests & readings per second ........... 64,250

Number of all requests ................... 56,000,000

Bench time ............................... 871.6348006s

Path ..................................... /origin/manifest.json


— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MartinKavik/addon_proxy_1/issues/1#issuecomment-634161291, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQTDLUZR6HFFDDRKYO5HLRTP22ZANCNFSM4MNRCPXQ .

--

[image: photo] Ivo Georgiev Founder & CEO, Stremio

https://www.stremio.com http://www.stremio.com

GPG: AC692BE9EB6E024B5C58A8EAC1E94996BC56BACE http://www.stremio.com https://www.facebook.com/stremio/ https://twitter.com/stremio https://www.instagram.com/stremioofficial/ https://www.reddit.com/r/Stremio/ http://blog.stremio.com/

MartinKavik commented 4 years ago

Nice, thanks!

The root (/) should be landing.html and everything else should return 404

I'm not sure if I understand - I think there will be conflicts between landing.html and / on origin, because proxy hasn't got any special domain just for itself. I suggest to return either response from origin or 404 (as you said) - only exceptions are special paths configured in proxy_config.toml:

reload_config_url_path = "/reload-proxy-config"
clear_cache_url_path = "/clear-cache"
status_url_path = "/status"

They are already implemented because I needed them for testing. I can make them all optional. /status returns only 200 and a short text.

Ivshti commented 4 years ago

My point was that any root access on a unrecognized hostname (not present in the configuration) should return the landing page.

On Thu, 28 May 2020 at 12:36, Martin Kavík notifications@github.com wrote:

Nice, thanks!

The root (/) should be landing.html and everything else should return 404

I'm not sure if I understand - I think there will be conflicts between landing.html and / on origin, because proxy hasn't got any special domain just for itself. I suggest to return either response from origin or 404 (as you said) - only exceptions are special paths configured in proxy_config.toml https://github.com/MartinKavik/addon_proxy_1/blob/master/proxy_config.toml#L4-L6 :

reload_config_url_path = "/reload-proxy-config"clear_cache_url_path = "/clear-cache"status_url_path = "/status"

They are already implemented because I needed them for testing. I can make them all optional. /status returns only 200 and a short text.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MartinKavik/addon_proxy_1/issues/1#issuecomment-635232521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQTDNTKXL6BF2PWB4XDJDRTYWBZANCNFSM4MNRCPXQ .

--

[image: photo] Ivo Georgiev Founder & CEO, Stremio

https://www.stremio.com http://www.stremio.com

GPG: AC692BE9EB6E024B5C58A8EAC1E94996BC56BACE http://www.stremio.com https://www.facebook.com/stremio/ https://twitter.com/stremio https://www.instagram.com/stremioofficial/ https://www.reddit.com/r/Stremio/ http://blog.stremio.com/

MartinKavik commented 4 years ago

Changes:

Next steps:

MartinKavik commented 4 years ago

Changes

Deploy & tests

Next steps: