fablabbcn / smartcitizen-api

The Smart Citizen Engine
https://developer.smartcitizen.me
GNU Affero General Public License v3.0
10 stars 4 forks source link

Review MQTT implementation for scalability #198

Open pral2a opened 3 years ago

pral2a commented 3 years ago

Review MQTT gem

Review the current MQTT library in use

Shared Subscription

Consider implementing multiple subscriber mqtt_subscriber.rake tasks as it was originally planned to balance ingestion load across multiple rails tasks taking advantage of the Emqx Shared Subscriptions feature.

That could be achieved by adding support to pass a configuration variable to mqtt_subscriber.rake to instantiate multiple tasks via docker-compose.yml

👓 We need to learn more about rake thread / process management. Read here. Otherwise, we might also consider doing it at a docker level. Is it a crazy idea?

Broker SSL/TLS with Let's Encrypt

Review implementation of Let's Encrypt on the MQTT Broker server to confirm renewal and configuration is ok. Currently works well.

MQTT Message persistance

That option is critical to ensure in case rails fails to ingest messages temporary the broker persist the messages for later ingestion. That combined with the new flash based local storage on the SCK 2.1 (after SAM firmware release 0.9.8) ensures in case the broker becomes unavailable data will be persisted by the SCK 2.1 and in case the rails subscription tasks fails data will be persisted at the broker.

Message persistence doesn't require any changes on the broker and is defined by the pub / sub clients. Here an example using mosquitto as a client and our broker in production, EMQ X.

$ mosquitto_pub --host mqtt.smartcitizen.me --topic 'foo/bar' -p 80 -u foo -m 'foo' -q 1
$ mosquitto_sub --host mqtt.smartcitizen.me --topic 'foo/+' -p 80 -u foo -q 1 -q 1 -i bar --disable-clean-session

In principle the current MQTT library supports that feature and can be implemented as follows:

https://github.com/fablabbcn/smartcitizen-api/blob/3e9202b4bcec1c0f17550a5ae13cbd8d6a9b9756/lib/tasks/mqtt_subscriber.rake#L13

However, it was implemented previously and lead to some instabilities in production after a mqtt_subscriber.rake crash.

Implementation needs to be reviewed on staging in particular to take in to account the mqtt_subscriber.rake peak load that can occur after a downtime when the broker buffered a lot of data.

viktorsmari commented 3 years ago

Also the current gem was last updated in 2016 https://github.com/njh/ruby-em-mqtt

pral2a commented 1 year ago

Asesing the migration from Event Machine (and ruby-em-mqtt)

  1. Event Machine release schedule is very low and the ruby-em-mqtt last update dates back from 2016.

  2. There are more efficient and better maintained asynchronous I/O libraries to implement scalable network clients in Ruby. All of them are built on nio4r. Looks like the best option could be Async. Actually, it might soon become part of the Ruby 3.X stdlib, we are in Ruby 2.6.8. Here is an example migration from EventMachine to Async.

  3. However, the question currently arises in how we implement MQTT on top of Async. I couldn't find any properly maintained MQTT implementation using async. Here an example of what might be the closest one.

The following conclusion leads to three potential solutions to evaluate:

🔍 The research continues...

oscgonfer commented 1 year ago

MQTT Persistence

Adding to the MQTT EMQX. We currently have migrated EMQX to 5.0.1 on staging and is running correctly with the following settings (ports and domain are redacted):

docker run -d --name emqx \
    --restart="unless-stopped" \
    --memory="3g" \
    --memory-swap="3g" \
    -p ****:**** -p ****:**** \
    -p ... \ 
    -e EMQX_MQTT__UPGRADE_QOS="true" \
    -e EMQX_MQTT__MQUEUE_STORE_QOS0="true" \
    -e EMQX_MQTT__SESSION_EXPIRY_INTERVAL="960h" \
    -e EMQX_MQTT__MAX_MQUEUE_LEN=10000000 \
    -e EMQX_NODE__COOKIE="*****"\
    -e EMQX_ALLOW_ANONYMOUS=false \
    -e EMQX_LISTENER__SSL__KEYFILE="/opt/emqx/etc/certs/privkey.pem" \
    -e EMQX_LISTENER__SSL__CERTFILE="/opt/emqx/etc/certs/fullchain.pem" \
    -e EMQX_LISTENER__WSS__KEYFILE="/opt/emqx/etc/certs/privkey.pem" \
    -e EMQX_LISTENER__WSS__CERTFILE="/opt/emqx/etc/certs/fullchain.pem" \
    -e EMQX_DASHBOARD__LISTENERS__HTTP__ENABLE=true \
    -e EMQX_DASHBOARD__LISTENERS__HTTP__BIND=* \
    -e EMQX_DASHBOARD__LISTENERS__HTTP__MAX_CONNECTIONS=5 \
    -e EMQX_DASHBOARD__LISTENERS__HTTPS__ENABLE=true \
    -e EMQX_DASHBOARD__LISTENERS__HTTPS__BIND=* \
    -e EMQX_DASHBOARD__LISTENERS__HTTPS__MAX_CONNECTIONS=5 \
    -e EMQX_DASHBOARD__LISTENERS__HTTPS__KEYFILE="/opt/emqx/etc/certs/privkey.pem" \
    -e EMQX_DASHBOARD__LISTENERS__HTTPS__CERTFILE="/opt/emqx/etc/certs/fullchain.pem" \
    -v /etc/letsencrypt/live/<domain>/fullchain.pem:/opt/emqx/etc/certs/fullchain.pem \
    -v /etc/letsencrypt/live/<domain>/privkey.pem:/opt/emqx/etc/certs/privkey.pem \
    -v /root/emqx/etc/acl.conf:/opt/emqx/etc/acl.conf \
    -v /root/emqx/log:/opt/emqx/log \
    emqx/emqx:5.0.11

The broker itself does need changes on the deployment, as seen above. The connection of the rails tasks with the mqtt-subscriber.rake file should be stablished with clean_session=False, for it to work. Environment variables are used for this, which were fixed in https://github.com/fablabbcn/smartcitizen-api/commit/3498deca73fe30cb6c92b911fb651f2ca831cb3e

As far as SSL

SSL is working fine at least on the dashboard, although there were some (now-solved) issues with regards to user permissions of the cert files in the docker volume. In principle, it should all go well on WSS and SSL listeners.

Issue is solved by (check here and here):

  1. Adding a emqx user on the host machine (emqx only in the container otherwise):

    useradd emqx
  2. Changing ownership of the certs and archive:

chmod 0755 /etc/letsencrypt/archive
chmod 0755 /etc/letsencrypt/live
chgrp emqx /etc/letsencrypt/live/<domain>/privkey.pem
chgrp emqx /etc/letsencrypt/archive/<domain>/privkey1.pem
chmod 0640 /etc/letsencrypt/live/<domain>/privkey.pem
chmod 0640 /etc/letsencrypt/archive/<domain>/privkey1.pem

chown emqx:emqx /etc/letsencrypt/live/<domain>/*.pem
  1. Adding the certificates line-by-line on the docker volumes so that it resolves the symlinks to ../../archive/ created by certbot.

TODO Check if the certificate autorenewal doesn't mess up anything in 90 days...

oscgonfer commented 11 months ago

Comments on the renewal:

  1. certbot needs to be run with dns-01 instead of https-01 due to our internal works with some IPTABLES. Check reference here and the configuration for the renewal below:
# renew_before_expiry = 30 days
version = 0.40.0
archive_dir = /etc/letsencrypt/archive/DOMAIN
cert = /etc/letsencrypt/live/DOMAIN/cert.pem
privkey = /etc/letsencrypt/live/DOMAIN/privkey.pem
chain = /etc/letsencrypt/live/DOMAIN/chain.pem
fullchain = /etc/letsencrypt/live/DOMAIN/fullchain.pem

# Options used in the renewal process
[renewalparams]
account = xxxxxxx
pref_challs = dns-01,
authenticator = manual
manual_auth_hook = /etc/letsencrypt/acme-dns-auth.py
server = https://acme-v02.api.letsencrypt.org/directory
manual_public_ip_logging_ok = True
  1. We have a post renewal hook in /etc/letsencrypt/renewal-hooks/post:
#!/bin/bash
DOMAIN=<DOMAIN>
USER='emqx'
user_exists(){ id "$1" &>/dev/null; } # silent, it just sets the exit code
if user_exists $USER; code=$?; then  # use the function, save the code
    echo "$USER exists. Skipping" 
else
    echo 'user not found' >&2  # error messages should go to stderr
    useradd emqx
fi

echo 'chmods...'
chmod 0755 /etc/letsencrypt/live
chmod 0755 /etc/letsencrypt/archive
chgrp $USER /etc/letsencrypt/live/$DOMAIN/privkey.pem
chgrp $USER /etc/letsencrypt/archive/$DOMAIN/privkey*.pem
chmod 0640 /etc/letsencrypt/live/$DOMAIN/privkey.pem
chmod 0640 /etc/letsencrypt/archive/$DOMAIN/privkey*.pem

echo 'chown to EMQX...'
chown emqx:emqx /etc/letsencrypt/live/$DOMAIN/*.pem
echo 'Done'
oscgonfer commented 7 months ago

Documented in docs/mqtt.md in the https://github.com/fablabbcn/smartcitizen-api/pull/293. Comments above are not up to date.

oscgonfer commented 5 months ago

One good place to take a look at and see how we are handling the mqtt messages is on the Slow subscription view of the EMQX broker: https://mqtt.smartcitizen.me:18084/#/slow-sub

This basically will queue up and buffer on mqtt the excess of unreceived messages. Notifications can be enabled via mqtt, so that we can trigger an email or similar.