Application server reliability

kunago commented 4 years ago

[x] The issue is present in the latest release.
[x] I have searched the issues of this repository and believe that this is not a duplicate.

What happened?

Application server tends to be unrealiable.

What did you expect?

I expect it to be reliable enough not to throw away payloads. LoRa devices can upload payloads frequently or less than that, like once a day for example. It is a problem to see the application server throwing away some payloads for various reasons.

Steps to reproduce this issue

Steps:

run application server
wait (sorry, there is nothing really specific one needs to do to replicate the problem)

Could you share your log output?

I have been able to trace 2 main problems:

JS decoder timeout

time="2020-09-18T04:14:10Z" level=error msg="decode payload error" application_id=1 codec=CUSTOM_JS dev_eui=476ac8680044003d error="execute js error: execution timeout" f_cnt=703 f_port=8

Redis issues

time="2020-09-18T07:02:14Z" level=error msg="handle received ping error: get ping lookup error: get ping lookup error: redis: nil"
chirpstack-application-server_1  | time="2020-09-18T07:02:14Z" level=error msg="finished unary call with code Internal" ctx_id=b3bbf6a7-4811-4d5a-a858-af1accfc3511 error="rpc error: code = Internal desc = handle received ping error: get ping lookup error: get ping lookup error: redis: nil" grpc.code=Internal grpc.method=HandleProprietaryUplink grpc.service=as.ApplicationServerService grpc.start_time="2020-09-18T07:02:14Z" grpc.time_ms=0.451 peer.address="172.22.0.41:37468" span.kind=server system=grpc

Even though I realize the redis issue may not be directly connected with chirpstack, I can't seem to set it up to be free of errors, although my docker file is very simple.

Your Environment

Component	Version
Application Server	v3.12.1
Network Server	v3.10.0
Gateway Bridge	v3.9.2
Chirpstack API	not using API
Geolocation	not used
Concentratord	not used

please note that the setup is using docker with this configuration:

cat docker-compose.yml (for chirpstack)

version: "3"
services:
  chirpstack-network-server:
    image: chirpstack/chirpstack-network-server:3
    env_file:
      - chirpstack-network-server.env
    networks:
      backend:
        ipv4_address: 172.22.0.41
    restart: always

  chirpstack-application-server:
    image: chirpstack/chirpstack-application-server:3
    env_file:
      - chirpstack-application-server.env
    expose:
      - 8080
    networks:
      backend:
        ipv4_address: 172.22.0.42
    restart: always

  chirpstack-gateway-bridge:
    image: chirpstack/chirpstack-gateway-bridge:3
    env_file:
      - chirpstack-gateway-bridge.env
    ports:
      - 1700:1700/udp
    networks:
      frontend:
        ipv4_address: 172.21.0.43
      backend:
        ipv4_address: 172.22.0.43
    restart: always

networks:
  frontend:
    external:
      name: docker-common_frontend
  backend:
    external:
      name: docker-common_backend

cat docker-compose.yml (for common)

version: '3'
services:
  influxdb:
    image: influxdb:1.8-alpine
    volumes:
      - /opt/docker-common/config/influxdb:/etc/influxdb
      - /opt/docker-common/data/influxdb:/var/lib/influxdb
    ports:
      - 8086:8086
    env_file:
      - common-influxdb.env
    depends_on:
      - redis
    networks:
      backend:
        ipv4_address: 172.22.0.14
    restart: always

  redis:
    image: redis:alpine
    volumes:
      - /opt/docker-common/data/redis:/data
    entrypoint: redis-server --appendonly yes --appendfsync no
    networks:
      backend:
        ipv4_address: 172.22.0.15
    restart: always

  mosquitto:
    image: eclipse-mosquitto
    expose:
      - 1883
    networks:
      backend:
        ipv4_address: 172.22.0.16
    restart: always

networks:
  frontend:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 172.21.0.0/24
  backend:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 172.22.0.0/24

then the config files for each part of chirpstack:

cat chirpstack-network-server.env
POSTGRESQL__DSN=postgres://chirpstack_ns:chirpstack_ns@docker-common_postgres_1/chirpstack_ns?sslmode=disable
REDIS__URL=redis://docker-common_redis_1:6379
NETWORK_SERVER__BAND__NAME=EU868
NETWORK_SERVER__GATEWAY__BACKEND__MQTT__SERVER=tcp://docker-common_mosquitto_1:1883
JOIN_SERVER__DEFAULT__SERVER=http://chirpstack-application-server:8003

cat chirpstack-gateway-bridge.env
INTEGRATION__MQTT__AUTH__GENERIC__SERVERS=tcp://docker-common_mosquitto_1:1883

cat chirpstack-application-server.env
POSTGRESQL__DSN=postgres://chirpstack_as:chirpstack_as@docker-common_postgres_1/chirpstack_as?sslmode=disable
REDIS__URL=redis://docker-common_redis_1:6379
APPLICATION_SERVER__INTEGRATION__MQTT__SERVER=tcp://docker-common_mosquitto_1:1883
APPLICATION_SERVER__API__PUBLIC_HOST=chirpstack-application-server:8001
APPLICATION_SERVER__EXTERNAL_API__JWT_SECRET=<some secret>

So the question is simple - is there a way to make the application server more reliable? Maybe it could store the failed payloads to memory and try again if it fails?

kunago commented 4 years ago

I think I found the solution to the first issue in the forum - this might help with the timeout, right?

APPLICATION_SERVERCODECJS__MAX_EXECUTION_TIME

What value is safe and reasonable enough? The current default is 100ms. If I set it to 1000ms, just to be sure the error is not because of a timeout, will that be a good idea or not?

EDIT: Timeout limit is one thing, quality of code another. Although I do not understand and therefore don't use bitwise operations, making my code run 4x faster did not take long and definitely is was one of the issues.

brocaar commented 4 years ago

The max execution time is there to make sure that in a multi-tenant environment, inefficient scripts (or never ending for loops) are killed after a certain execution time. There is no recommendation that I can give you.
The handle received ping error: get ping lookup error is not something which is related to "It is a problem to see the application server throwing away some payloads for various reasons".

I'm going to close this issue. In the future, please make the issues as specific as possible as with this issue it is not clear for me (or others wanting to contribute) what should be done to solve this issue.

brocaar / chirpstack-application-server