absmach / export

Mainflux Export service that sends messages from one Mainflux cloud to another via MQTT
10 stars 11 forks source link

export service stuck when internet connection lost #23

Closed pricelessrabbit closed 4 years ago

pricelessrabbit commented 4 years ago

Hi, there is a possible issue when the internet connection is lost and export service cannot connect with the mqtt uplink. This (in my case) results in data loss. In particular seems that the exporter stops to process the incoming NATS messages.

environment

I used a dockerized environment with all the edge stack

version: "3.7"

services:

  nats:
    image: nats:1.3.0
    command: "-c /etc/nats/nats.conf"
    ports: 
      - 4222:4222
    volumes:
      - ./nats/nats.conf:/etc/nats/nats.conf
      - ./nats/data/store:/etc/nats/store
    restart: on-failure

  redis:
    image: redis:6-alpine
    ports: 
      - 6379:6379
    volumes: 
      - ./redis/data:/data
      - ./redis/conf/redis.conf:/usr/local/etc/redis/redis.conf
    restart: on-failure

  agent:
    image: mainflux/agent:latest
    environment: 
        - MF_AGENT_NATS_URL=nats://nats:4222
        - MF_AGENT_LOG_LEVEL=debug  
        - MF_AGENT_CONFIG_FILE=/configs/config.toml
    env_file:
        ./.env
    volumes: 
        - ./agent/configs:/configs
    restart: on-failure
    depends_on: 
        - nats

  export:
    image: mainflux/export
    ports: 
      - 8170:8170
    env_file:
      ./.env
    volumes: 
      - ./agent/configs/export:/configs
    restart: on-failure
    depends_on: 
      - nats
      - redis

exporter config

File = "/configs/export/config.toml"

[exp]
  log_level = "debug"
  nats = "nats://nats:4222"
  port = "8172"
  cache_url = "redis:6379"
  cache_pass = ""
  cache_db = "0"

[mqtt]
  ca_path = ""
  cert_path = ""
  channel = "<channel>"
  host = "tcp:/<host>:1883"
  mtls = false
  password = "<pass>"
  priv_key_path = ""
  qos = 2
  retain = false
  skip_tls_ver = false
  username = "<user>"

[[routes]]
  mqtt_topic = "channels/<channel>/messages"
  nats_topic = ">"
  subtopic = ""
  type = "plain"
  workers = 10

Steps to reproduce

I started a go test executable that publish a message in the export.message NATS subject. The exporter relay the messages to the mainflux cloud

2020/07/30 13:11:35 Configuration loaded from file ../configs/config.toml
export_1  | {"level":"debug","message":"Client export-d6d128d8-841a-4906-81d0-cc7ab1a6c682 connected","ts":"2020-07-30T13:11:35.409503589Z"}
export_1  | {"level":"error","message":"Failed to create stream group: BUSYGROUP Consumer Group name already exists","ts":"2020-07-30T13:11:35.410850563Z"}
export_1  | {"level":"info","message":"Stream group \u003e created ","ts":"2020-07-30T13:11:35.410924929Z"}
export_1  | {"level":"info","message":"Export service started, exposed port :8172","ts":"2020-07-30T13:11:35.411072197Z"}
export_1  | {"level":"info","message":"Republish, waiting for stream data","ts":"2020-07-30T13:11:35.411050072Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 148","ts":"2020-07-30T13:11:37.341565285Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 149","ts":"2020-07-30T13:11:39.346738911Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 150","ts":"2020-07-30T13:11:41.343323877Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 151","ts":"2020-07-30T13:11:43.344359133Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 152","ts":"2020-07-30T13:11:45.347391765Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 153","ts":"2020-07-30T13:11:47.348156214Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 154","ts":"2020-07-30T13:11:49.34798676Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 155","ts":"2020-07-30T13:11:51.350006471Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 156","ts":"2020-07-30T13:11:53.348543792Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 157","ts":"2020-07-30T13:11:55.349915616Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 158","ts":"2020-07-30T13:11:57.354041142Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 159","ts":"2020-07-30T13:11:59.357020898Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 160","ts":"2020-07-30T13:12:01.359262115Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 161","ts":"2020-07-30T13:12:03.35459824Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 162","ts":"2020-07-30T13:12:05.357387038Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 163","ts":"2020-07-30T13:12:07.35790137Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 164","ts":"2020-07-30T13:12:09.358460938Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 165","ts":"2020-07-30T13:12:11.357249305Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 166","ts":"2020-07-30T13:12:13.360555103Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 167","ts":"2020-07-30T13:12:15.359051578Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 168","ts":"2020-07-30T13:12:17.359637009Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 169","ts":"2020-07-30T13:12:19.360792346Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 170","ts":"2020-07-30T13:12:21.361622165Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 171","ts":"2020-07-30T13:12:23.361572003Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 172","ts":"2020-07-30T13:12:25.365206421Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 173","ts":"2020-07-30T13:12:27.365519358Z"}
export_1  | {"level":"debug","message":"Client export-d6d128d8-841a-4906-81d0-cc7ab1a6c682 disconnected","ts":"2020-07-30T13:13:10.41096313Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 175","ts":"2020-07-30T13:13:17.577789392Z"}
export_1  | {"level":"error","message":"not connected to mqtt broker","ts":"2020-07-30T13:13:17.577826033Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 178","ts":"2020-07-30T13:13:17.577821667Z"}
export_1  | {"level":"error","message":"not connected to mqtt broker","ts":"2020-07-30T13:13:17.577859377Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 181","ts":"2020-07-30T13:13:17.57794355Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 182","ts":"2020-07-30T13:13:17.577973627Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 174","ts":"2020-07-30T13:13:17.577929897Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 179","ts":"2020-07-30T13:13:17.578131105Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 180","ts":"2020-07-30T13:13:17.578151371Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 183","ts":"2020-07-30T13:13:17.57815695Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 177","ts":"2020-07-30T13:13:17.578172894Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 176","ts":"2020-07-30T13:13:17.578184137Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 185","ts":"2020-07-30T13:13:17.580156408Z"}
export_1  | {"level":"debug","message":"Published to: export.message, payload: message 184","ts":"2020-07-30T13:13:17.580139305Z"}

I tried to power off the internet connection to simulate a network issue. After that (2020-07-30T13:13:17.577859377Z) the exporter sends some out-of-order messages (I guess the ones that was stored in the redis stream) but then it got stuck. After the message 184 it doesn't produce any logs but in the meantime the go test service continues to publish messages in the NATS subject. If I restart the exporter, it starts to send the new messages but all the data between the "stuck" state and the restart seems to be lost

mteodor commented 4 years ago

if you can test this PR https://github.com/mainflux/export/pull/25

If using Ubuntu or similar OS, to simulate disconnection you can use firewall and tcpkill ( you need to install )

sudo ufw deny out 1883; sudo tcpkill -i wlp3s0 port 1883

then renable port

sudo ufw allow out 1883
pricelessrabbit commented 4 years ago

Thanks for the rapid fix. Question: looking at the PR i see that now paho storage is used. Is the redis service still nedeed for to persist data or can i drop it?

If using Ubuntu or similar OS, to simulate disconnection you can use firewall and tcpkill ( you need to install )

Looks too complicated. I prefer to unplug ethernet cable or power off the router ( and see the other team members swear).

drasko commented 4 years ago

@PricelessRabbit yes, Redis Streams has been removed and we use MQTT QoS for buffering. It has a few limitations, but it is simpler, easier to maintain and covers most of the use-cases.