chirpstack / chirpstack-gateway-bridge

ChirpStack Gateway Bridge abstracts Packet Forwarder protocols into Protobuf or JSON over MQTT.
https://www.chirpstack.io
MIT License
415 stars 269 forks source link

No reconnect if the initial connection fails because certificate has expired (v3.13.3) #210

Closed mullerch closed 1 year ago

mullerch commented 1 year ago

Steps to reproduce this issue

Steps:

  1. Set the time of the gateway to the past (e.g. 2012)
  2. Confgure the CSGB >=v3.13.3 to connect a MQTT broker with TLS enabled
  3. Start the CSGB, the connection will fail reporting the certificate is not yet valid
  4. Wait for some more time

What do you expect?

I expect the CSGB continuing to reconnect to the server indefinitely.

What happens instead?

The CSGB does not try to reconnect anymore. At any uplink or stat report it will say that it is "not connected" but does not try to reconnect either.

Could you share your log output?

time="2012-01-01T00:11:51.633064Z" level=error msg="[client] x509: certificate has expired or is not yet valid: current time 2012-01-01T00:11:50Z is before 2022-06-24T09:21:28Z" module=mqtt
time="2012-01-01T00:11:51.65220232Z" level=warning msg="[client] failed to connect to broker, trying next" module=mqtt
time="2012-01-01T00:11:51.6676576Z" level=error msg="[client] Failed to connect to a broker" module=mqtt
time="2012-01-01T00:12:25.58281776Z" level=error msg="publish event error" error="not Connected" event_type=up gateway_id=fcc23dfffe0b6866 uplink_id=7847c4fb-32fb-4645-9f11-231817cce43f
time="2012-01-01T00:12:28.03803664Z" level=error msg="publish event error" error="not Connected" event_type=stats gateway_id=fcc23dfffe0b6866 stats_id=634a6f44-270b-4a64-8075-3a750f05b7c3
time="2012-01-01T00:12:58.00843728Z" level=error msg="publish event error" error="not Connected" event_type=stats gateway_id=fcc23dfffe0b6866 stats_id=949b1970-0eae-4435-9f5c-40fa9929dc72
time="2012-01-01T00:13:28.00825152Z" level=error msg="publish event error" error="not Connected" event_type=stats gateway_id=fcc23dfffe0b6866 stats_id=7d867425-98df-4ab1-870f-7df7c8b25a7e
time="2012-01-01T00:13:58.01437832Z" level=error msg="publish event error" error="not Connected" event_type=stats gateway_id=fcc23dfffe0b6866 stats_id=851c4b30-1eaf-426f-adff-344a127cfa55

As a workaround, I've tried to set terminate_on_connect_error=true to make the process manager restart the CSGB, but the CSBG does never end, it doen't seem aware it's not connected.

brocaar commented 1 year ago

Thanks for reporting. In the latest release I have updated the Paho MQTT client to the latest version as there have been a few fixes. I will test if I can reproduce this issue against the latest version.

nparikhqmc commented 1 year ago

Seeing this same issue with CSGB connecting to Azure. Would expect a reconnect or a failure which we could terminate the process with and let monit restart it.

brocaar commented 1 year ago

I'm now looking into this.

brocaar commented 1 year ago

I have tried the client-certificate validation part (validated by the MQTT broker), which does attempt to re-connect.

ERRO[0052] [client]   Failed to connect to a broker      module=mqtt
ERRO[0052] integration/mqtt: connection error            error="network Error : remote error: tls: bad certificate"
ERRO[0054] [net]      connect got error remote error: tls: bad certificate  module=mqtt
ERRO[0054] [client]   Connecting to ssl://localhost:8883 CONNACK was not CONN_ACCEPTED, but rather Connection Error  module=mqtt
ERRO[0054] [client]   Failed to connect to a broker      module=mqtt
ERRO[0054] integration/mqtt: connection error            error="network Error : remote error: tls: bad certificate"
ERRO[0056] [net]      connect got error remote error: tls: bad certificate  module=mqtt
ERRO[0056] [client]   Connecting to ssl://localhost:8883 CONNACK was not CONN_ACCEPTED, but rather Connection Error  module=mqtt
ERRO[0056] [client]   Failed to connect to a broker      module=mqtt
ERRO[0056] integration/mqtt: connection error            error="network Error : remote error: tls: bad certificate"
ERRO[0059] [net]      connect got error remote error: tls: bad certificate  module=mqtt
ERRO[0059] [client]   Connecting to ssl://localhost:8883 CONNACK was not CONN_ACCEPTED, but rather Connection Error  module=mqtt
ERRO[0059] [client]   Failed to connect to a broker      module=mqtt
ERRO[0059] integration/mqtt: connection error            error="network Error : remote error: tls: bad certificate"

I will now try with the server-certificate validation.

brocaar commented 1 year ago

I created a MQTT server-certificate with valid-from in the future and am now seeing the same errors:

ERRO[0099] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:08+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0099] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0099] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0099] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:08+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0101] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:10+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0101] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0101] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0101] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:10+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0103] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:12+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0103] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0103] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0103] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:12+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0105] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:14+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0105] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0105] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0105] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:14+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0107] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:16+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0107] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0107] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0107] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:16+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0109] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:18+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0109] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0109] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0109] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:18+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0111] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:20+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0111] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0111] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0111] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:20+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0113] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:22+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0113] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0113] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0113] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:22+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0115] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:24+01:00 is before 2022-08-16T23:55:00Z  module=mqtt                                                                                                 
WARN[0115] [client]   failed to connect to broker, trying next  module=mqtt                                                   
ERRO[0115] [client]   Failed to connect to a broker      module=mqtt                                                          
ERRO[0115] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:24+01:00 is before 2022-08-16T23:55:00Z"                                                   
ERRO[0117] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:26+01:00 is before 2022-08-16T23:55:00Z  module=mqtt
WARN[0117] [client]   failed to connect to broker, trying next  module=mqtt
ERRO[0117] [client]   Failed to connect to a broker      module=mqtt
ERRO[0117] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:26+01:00 is before 2022-08-16T23:55:00Z"
ERRO[0119] [client]   x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:28+01:00 is before 2022-08-16T23:55:00Z  module=mqtt
WARN[0119] [client]   failed to connect to broker, trying next  module=mqtt
ERRO[0119] [client]   Failed to connect to a broker      module=mqtt
ERRO[0119] integration/mqtt: connection error            error="network Error : x509: certificate has expired or is not yet valid: current time 2022-08-16T12:06:28+01:00 is before 2022-08-16T23:55:00Z"

However, it keeps looping / re-connecting. I've waited for 2 minutes. Tested against the v3.13.3 version. Could you post your full configuration file?

brocaar commented 1 year ago

I have found a way to reproduce this issue. Now looking into it...

brocaar commented 1 year ago

I have been able to reproduce this using the ChirpStack Gateway OS and by setting its time a few years back history (and disabling NTP). The weird thing is that I can only reproduce this on reboot. I changed the connect method to:

func (b *Backend) connect() error {
    log.Warn("connect begin")
    b.connMux.Lock()
    defer b.connMux.Unlock()
    log.Warn("connect got mutex")

    if err := b.auth.Update(b.clientOpts); err != nil {
        return errors.Wrap(err, "integration/mqtt: update authentication error")
    }

    b.conn = paho.NewClient(b.clientOpts)
    if token := b.conn.Connect(); token.WaitTimeout(b.maxTokenWait) && token.Error() != nil {
        return token.Error()
    }

    log.Warn("connect end")

    return nil
}

And connectLoop to:

// connectLoop blocks until the client is connected
func (b *Backend) connectLoop() {
    for {
        log.Warn("Start connect")
        if err := b.connect(); err != nil {
            if b.terminateOnConnectError {
                log.Fatal(err)
            }

            log.WithError(err).Error("integration/mqtt: connection error")
            time.Sleep(time.Second * 2)

        } else {
            log.Warn("Connect ok")
            break
        }
    }
}

note the log.Warn(...) messages.

Reboot

On reboot, I see:

Dec 25 13:08:05 raspberrypi4 user.warn chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:05.819251274Z" level=warning msg="Start connect"
Dec 25 13:08:05 ras pberrypi4 user.warn chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:05.819473163Z" level=warning msg="connect begin"
Dec 25 13:08:05 raspberrypi4 user.warn chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:05.819617052Z" level=warning msg="connect got mutex"
Dec 25 13:08:05 raspberrypi4 user.debug chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:05.819794866Z" level=debug msg="[client]   Connect()" module=mqtt
Dec 25 13:08:05 raspberrypi4 user.debug chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:05.819954329Z" level=debug msg="[store]    memorystore initialized" module=mqtt
Dec 25 13:08:05 raspberrypi4 user.debug chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:05.82066007Z" level=debug msg="[client]   about to write new connect msg" module=mqtt
Dec 25 13:08:06 raspberrypi4 user.warn chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:06.820373199Z" level=warning msg="connect end"
Dec 25 13:08:06 raspberrypi4 user.warn chirpstack-gateway-bridge[430]: time="2014-12-25T13:08:06.820726625Z" level=warning msg="Connect ok"

Restart

On restart (of the chirpstack-gateway-bridge process):

Dec 25 13:10:27 raspberrypi4 user.warn chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.471678338Z" level=warning msg="Start connect"
Dec 25 13:10:27 raspberrypi4 user.warn chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.472124189Z" level=warning msg="connect begin"
Dec 25 13:10:27 raspberrypi4 user.warn chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.472261838Z" level=warning msg="connect got mutex"
Dec 25 13:10:27 raspberrypi4 user.debug chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.472429967Z" level=debug msg="[client]   Connect()" module=mqtt
Dec 25 13:10:27 raspberrypi4 user.debug chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.472610208Z" level=debug msg="[store]    memorystore initialized" module=mqtt
Dec 25 13:10:27 raspberrypi4 user.debug chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.472982189Z" level=debug msg="[client]   about to write new connect msg" module=mqtt
Dec 25 13:10:27 raspberrypi4 user.err chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.586977337Z" level=error msg="[client]   x509: certificate has expired or is not yet valid: current time 2014-12-25T13:10:27Z is before 2021-10-05T13:52:00Z" module=mqtt
Dec 25 13:10:27 raspberrypi4 user.warn chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.587226134Z" level=warning msg="[client]   failed to connect to broker, trying next" module=mqtt
Dec 25 13:10:27 raspberrypi4 user.err chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.587405671Z" level=error msg="[client]   Failed to connect to a broker" module=mqtt
Dec 25 13:10:27 raspberrypi4 user.debug chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.58756293Z" level=debug msg="[store]    memorystore closed" module=mqtt
Dec 25 13:10:27 raspberrypi4 user.err chirpstack-gateway-bridge[610]: time="2014-12-25T13:10:27.587799245Z" level=error msg="integration/mqtt: connection error" error="network Error : x509: certificate has expired or is not yet valid: current time 2014-12-25T13:10:27Z is b

In summary, on reboot the connect returns no error and the client thinks it is connected (and the connectLoop terminates), but reports the error afterwards. Then after a restart of the chirpstack-gateway-bridge service, the connect does return an error and the connectLoop tries to re-connect.

Edit: I did disable the chirpstack-concentratord to rule out the possibility that this issue has to do with a possible boot order (e.g. concentratord not yet started when the GW bridge is started on boot).

brocaar commented 1 year ago

:facepalm: token.WaitTimeout returns false in case of timeout, I think I made the assumption that after the timeout token.Error() would contain a timeout error, which is not the case.