Proxy real-time APIs? - Githubissues

brylie commented 8 years ago

We have had requests as to whether API Umbrella can proxy real-time, or streaming, APIs, e.g. using MQTT.

Are there any agencies using API Umbrella to proxy streaming APIs? What would be the overhead of such a scenario?

GUI commented 8 years ago

It depends on what exactly you mean by streaming APIs. We do support streaming HTTP responses and we explicitly ensure nothing in API Umbrella blocks or buffers responses being sent back to the client. We do currently buffer the request bodies sent by clients, but this should be solvable (previously, this was a restriction with nginx, but they added the proxy_request_buffering setting earlier this year, we just haven't had a driver for testing everything with this switch flipped).

However, for something like MQTT, that's a TCP-level protocol (not HTTP), so we don't currently support anything with that. I'm not super familiar with MQTT, but I'm not exactly sure what support for TCP protocols would really look like at an API management layer. It's certainly possible to proxy TCP traffic to underlying servers (although TCP proxying is a pretty new feature to nginx--I think they just added it this year), but all of the API management functionality would have to be rethought (if it's even applicable). And without standard things like HTTP headers or URLs, I think the specifics could also differ quite a bit depending on each TCP protocol.

That being said, WebSockets is probably one streaming TCP protocol that might better align with an API management layer, since the initial handshake is over HTTP. I haven't tested any websocket connections with API Umbrella, but theoretically the handshake could be treated like any other HTTP API request, although other things may still need to be rethought (for example, are rate limits relevant with a single persistent connection?)

So basically the short version is that we do support streaming HTTP responses, but nothing with level 4 (TCP) protocols.

brylie commented 8 years ago

@tuukka will you please help clarify how we might use a proxy, such as API Umbrella, with the HSL MQTT APIs?

I.e. we want to clarify how API Umbrella might be used in the HSL system, which is using MQTT. What would be the benefits versus the system costs?

tuukka commented 8 years ago

Our primary use case happens to be MQTT over WebSockets (to end-user browsers). Something simple but interesting might be to use API Umbrella to publish a rate-limited API that just responds with a HTTP redirect (plus an authorisation cookie?) to the actual MQTT server. This way, we could use API Umbrella to collect statistics on and rate-limit the number of new connections.

A second step might be to proxy the actual MQTT (over WebSocket) connections to get more statistics such as connection durations and bit rates. Here, with some protocol-specific support in API Umbrella, we could additionally manage per connection the number of MQTT subscriptions and the message rate. In our primary use case (vehicles on a map), the rate of messages is huge anyway (I'd guess 10k-100k messages per second outbound), so proxying might be realistic only for managing the other use cases with lower rates but higher value (e.g. upcoming arrivals to a station).

An alternative direction to look into might be to build the management backend of these connections directly into an MQTT server such as mosquitto in place of nginx.

GUI commented 8 years ago

Interesting. It seems like your idea of an HTTP redirect would probably work with API Umbrella as it exists now (since everything on API Umbrella's end would be over HTTP).

With WebSockets in the mix, another approach falling somewhere between your first and second idea would be to have API Umbrella proxy all the traffic (including MQTT), but without an protocol-specific knowledge or support. In this case, we'd handle the initial HTTP handshake/upgrade request as a normal API request (so this initial connection could be rate limited, we'd gather analytics, you could use api keys, etc). But after the upgrade took place, then we'd just proxy the rest of the traffic as a dumb TCP tunnel. This would mean we might not be able to gather extra information (like duration and bit rates), but it might be a slightly simpler approach to protecting the underlying server (and the overhead of proxying in this mode should be extremely minimal). This is also an approach that we may already support, or, if not, I think we should be pretty close to supporting. I haven't tested any of this, so that's why I'm not sure, but since nginx supports proxying websockets connections in this manner I think it should theoretically be possible (but some config tweaks might be required).

kyyberi commented 8 years ago

+1 Sounds like testable idea

tuukka commented 8 years ago

In the end, redirection was not viable as the Websocket specs forbid redirection. :-(

Proxying doesn't seem to be working either, probably because the necessary headers aren't getting through?

Connection: upgrade
Upgrade: websocket

I'm testing like this:

    mqtt = require('mqtt');
    mqtt_client = mqtt.connect('wss://umbrella.digipalvelutehdas.fi:443/hsl/mqtt-test2/?api_key=yeLLJgZCyw5kTHOZgv9GERTI6BqFhsJ7pNn2bPFB');

brylie commented 8 years ago

@tuukka thanks for testing. @GUI, any thoughts?

GUI commented 8 years ago

Ah, I realized at least one thing that's playing a role is we clear the Connection header when proxying the requests. This is done so we can support backend keepalive connections (see nginx's keepalive docs).

So to begin with, we'd probably need to make clearing that header conditional on whether it's an upgrade request or not. That should be fairly simple, but there may be other small tweaks like this necessary in the stack to get websockets fully working.

I'll try to look into this at some point, but pull requests are also welcome.

Thanks for testing this out! This is definitely useful to know.

jykae commented 8 years ago

:+1:

brylie commented 8 years ago

How would introducing a real-time reverse proxy service, such as Pushpin affect the API Umbrella platform?

brylie commented 8 years ago

A recent blog post illustrates using the Pushpin proxy with the Kong API management platform.

Perhaps NREL could test API Umbrella in similar configurations with Pushpin?

tuukka commented 8 years ago

Just a comment from my point of view: I wouldn't add Pushpin to a modern system architecture where both clients and servers support Websockets.

NREL / api-umbrella

Proxy real-time APIs? #198