Trying to log into server with wrong credentials times out instead of returning an error

eclipse / paho.golang

Go libraries

Other

341 stars 93 forks source link

Trying to log into server with wrong credentials times out instead of returning an error #192

Open XANi opened 11 months ago

XANi commented 11 months ago

when trying to connect to server

    conn, err := autopaho.NewConnection(t.mqttCtx, t.mqttCfg)
    if err != nil {
        return err
    }
    connectCtx, _ := context.WithTimeout(t.mqttCtx, time.Second*30)
    if err = conn.AwaitConnection(connectCtx); err != nil {
        return fmt.Errorf("error connecting to mq: %w", err)
    }

and password is wrong (or unset, btw why it doesn't take password from url.URL automatically but need separate call?) this will time out instead of returning authorization error. I've snooped on tcpdump and server is returning code 135 not authorized to the client but client just hangs till timeout.

Version: v0.12.0

MattBrittan commented 11 months ago

This is by design; autopaho will just keep attempting to connect until you stop it (by cancelling the context or calling Disconnect). AwaitConnection is just a utility function to allow you to easily detect when/if the connection comes up. AwaitConnection was mainly added to simplify examples; I would not expect to be used all that often in production code (it's generally better to use the OnConnectionUp callback because the connection may drop/be re-established at any time).

The aim of autopaho is to establish/maintain a connection in whatever way it can, so it will retry until you stop it (if you want to manage the connection yourself then use paho). In many production environments there is little you can do when there is a connection issue (perhaps report it to some logging server) and retrying constantly is the right approach. We don't want to make assumptions about what action should be taken if there is an error (even in your case this issue could be caused by an issue on the server so retrying may eventually lead to a connection).

If you want to handle such errors then add an OnConnectError handler and take whatever action is needed to handle the issue.

btw why it doesn't take password from url.URL automatically but need separate call?

Partially because it can be confusing; for example when establishing a web-socket connection should the username/password be passed when establishing the http connection (e.g. Authorization header), in the MQTT connect packet, or both?.

XANi commented 11 months ago

This is by design; autopaho will just keep attempting to connect until you stop it (by cancelling the context or calling Disconnect). AwaitConnection is just a utility function to allow you to easily detect when/if the connection comes up.

Okay, I get it now. I was just looking for simple way to error out on first non-network-related error when the app was starting as "your credentials is wrong" is almost always non-recoverable/config error in my use case, thanks for the info.

btw why it doesn't take password from url.URL automatically but need separate call?

Partially because it can be confusing; for example when establishing a web-socket connection should the username/password be passed when establishing the http connection (e.g. Authorization header), in the MQTT connect packet, or both?.

Okay but Websockets already have option for customizing requests with WebsocketConfig and add required headers, and it's the only protocol that requires such double authentication.

It also blocks from having 2 different configs, say if someone wanted to migrate from old to new server with different creds.

MattBrittan commented 11 months ago

I'm open to PR's on this but am keen to avoid handling lots of options (don't want this to get as complicated as the V3 client).

One approach might be to pass the URL to ConnectPacketBuilder; this would enable you to set the username/password pretty simply but also allow the flexibility to do other stuff too (for instance set Authentication Method based on the URL). If doing this we may as well add ClientConfig too (making the signature closer to AttemptConnection).

XANi commented 11 months ago

Maybe I'd describe how I hit it.

I was trying to use library (v0.12.0) via examples and I haven't noticed any for using username and password, nor any docs about it.

So I looked at what config struct exported, saw no user/password whatsoever and thought "well, it wants url.URL, that parses user and password from it just fine, surely the lib just uses it? After all different URLS might need different auth". So it "worked" on my test setup (I forgot that server I used for testing had enabled guest users) and didn't on actual target server so I digged thru the source code and found out how to do it "properly".

https://github.com/eclipse/paho.golang/commit/4ee8c92e266ab2325d48147f6e6bebf5aa7a1278 fixed that discoverability issue (at least for me), I just think it's a bit ugly to have to throw

    if cfg.MQTTURL[0].User != nil && len(cfg.MQTTURL[0].User.Username()) > 0 {
        mqttTr.mqttCfg.ConnectUsername = cfg.MQTTURL[0].User.Username()
        p, _ := cfg.MQTTURL[0].User.Password()
        mqttTr.mqttCfg.ConnectPassword = []byte(p)
    }

boilerplate every time I want to just use URL alone. And that is obviously wrong for the rare case multiple URLs would require different credentials.

I guess the question is whether that kind of feature is desired.

If the assumption is "multiple server support is there just to support a cluster of identical machines", not for using it as crutch during infrastructure migration to server with different config then it isn't worth changing.

But if "you put URL of old server as first, new server as second, to allow seamless migration for clients when the old one is down" is desired use case, then the "Server" should be separate struct with URL + credentials + TLS config so there is no global server config values, just per server one.... and that makes config uglier for majority of cases where user doesn't need more than one server or they differ only in URL.

One approach might be to pass the URL to ConnectPacketBuilder; this would enable you to set the username/password pretty simply but also allow the flexibility to do other stuff too (for instance set Authentication Method based on the URL). If doing this we may as well add ClientConfig too (making the signature closer to AttemptConnection).

That sounds like good way to leave a way to do it without messing the config too much; I'll give it a go and throw a PR when I find a moment.

MattBrittan commented 11 months ago

I'm open to retrieving the username from the URL; however, as I said, this can get confusing (see this workaround in the v3 client). As such neither solution is perfect...

Given the need to introduce a bit of a hack for the username/password within the URL to work with websockets I think my preference is for the modification to ConnectPacketBuilder I suggested (however I'm open to being convinced that another way is better!).

But if "you put URL of old server as first, new server as second, to allow seamless migration for clients when the old one is down"

The issue with this is that when using QOS1+ this can result in lost messages (because the sessions between the two servers will not be in sync). I was even considering only supporting a single URL (to try and avoid users making mistakes that lead to lost messages). Another alternative might be to have a NextServer() *url.URL type callback that the library calls to get the next server (for advanced users, with appropriate warnings added etc).

hspaay commented 11 months ago

Interesting discussion, Just ran into the connection issue as well.

The aim of autopaho is to establish/maintain a connection in whatever way it can, so it will retry until you stop it (if you want to manage the connection yourself then use paho). In many production environments there is little you can do when there is a connection issue (perhaps report it to some logging server) and retrying constantly is the right approach. We don't want to make assumptions about what action should be taken if there is an error (even in your case this issue could be caused by an issue on the server so retrying may eventually lead to a connection).

If you want to handle such errors then add an OnConnectError handler and take whatever action is needed to handle the issue.

Hmm, bad authentication is not really a connection error though, as the connection was successful. As such there is no need to retry the connection as it was working in the first place. A second point is that bad authentication is not something you want to retry as this is an application level problem and can only be resolved by the application.

I love the simplicity argument but in this case every single user of paho will now have to hook into OnError and implement a workaround for a use-case that is over-simplified.

I hope this makes a case for detecting authentication errors on connect and return an error instead of retrying until the cows come home.

MattBrittan commented 11 months ago

bad authentication is not something you want to retry as this is an application level problem and can only be resolved by the application.

This is one perspective; however you are assuming that the issue is client side. A server side issue is also possible (and potentially has a much larger impact); for example:

You are adding a user to the Mosquitto passwd file and delete a line in error; this de-authenticates as significant number of clients
AWS/Azure/Hive etc has a temporary auth issue and briefly denies all incoming auth requests.

In my case this has the potential to knock hundreds of clients offline; some of these cannot be accessed remotely (e.g. behind NAT) so, without retry, would need to be physically visited to re-establish connectivity. Hence I feel that the safer approach, for a production system, is to retry (so I, for one, am not one of the "every single user of paho" that has to implement a work around :-) ).

There are also a range of other issues with returning on an auth (or any other error) including:

NewConnection is async; it may take days for a connection to come up (not uncommon with dodgy network connections) so blocking that long does not seem appropriate.
Multiple servers are supported; auto failing on one does not mean that it will fail on the second.
PublishViaQueue allows messages to be published before a connection is established (this would break if NewConnection blocked). This functionality is important to me (I want to be able to call NewConnection and start sending messages leaving everything related to connection management, message queuing etc to the library).
In a significant number of deployments the auth will be via TLS client certs, so should a failure there also abort NewConnection? (current design is that NewConnection should only return an error if there is a fatal issue with the configuration which means there is no chance a connection will ever be established; e.g. no servers specified).

Hopefully this provides an explanation of my reasoning for the design decisions; however I should also note that part of the reason I created autopaho was to meet my use-case (lots or remote units with dodgy connections and a requirement that no data is lost); so the design will be influenced by that (and also by the V3 client, paho async c client, paho python etc). Autopaho will not be suitable for all use-cases (I've tried to keep it reasonably flexible but there are limits to that).

If this is an issue that multiple users are encountering then we should definitely improve the documentation (this needs work anyway!).

XANi commented 11 months ago

I think there are 2 cases to consider there:

Auth failed immediately on first connection
Auth failed sometime during application working

From my experience the case 2 is what you described, it might be just temporary server misconfiguration but case 1. happens almost exclusively during testing&setup phase and is user error OR it is something that should be communicated to user rather than just retried in the background. So it should be pretty easy for user to configure.

I think it might be sensible to give ability to easily split off "network errors" (mostly network timeouts), from "authentication errors" (we got connected to server but it returned error).

One way would be via documented error types (currently it's pretty random, bad auth will return *autopaho.ConnackError, wrong host will return *fmt.wrapError); other way would be splitting off hook (there is already OnClientError and OnConnectError so OnServerError would be natural), but that's a bit more work for setup. Whether TLS errors should be considered connection or server errors is open question but I'd veer on side on it being ServerError.

So ability to do either

mqttCfg.OnServerError = func(err error) {
        if err.ErrorCode == paho.ReasonUnauthorized || e.ErrorCode == paho.ReasonBadMethod {
            log.Error("wrong username or password")
            authFail++
        } else {
            log.Errorf("Server refused connection: %s",err)
        }
        if authFail  > 100 { botherUser() }
}

mqttCfg.OnError = func(err error) {
    switch e := err.(type) {
        case *autopaho.ConnectionError: log.Warnf("connection error: %s",e)
        case *autopaho.ServerError: 
            if e.ErrorCode == paho.ReasonUnauthorized || e.ErrorCode == paho.ReasonBadMethod{
                log.Error("wrong username or password")
                authFail++
            } else {
                log.Errorf("Server refused connection: %s",err)
            }
            if authFail  > 100{ botherUser() }
        default: log.Warnf("error[%T]: %s",e,e)
}

I do think second one is a bit cleaner as there is only one hook for errors and "simple" app might just log it, while more complex needs can be handled via types

MattBrittan commented 11 months ago

case 1. happens almost exclusively during testing&setup phase and is user error OR it is something that should be communicated to user rather than just retried in the background

This was really the intention with OnError; at a minimum the errors should be logged so they can be dealt with, but the library should not dictate how this logging occurs (because that's application specific). The challenge here is having a solution that is workable for most users. The issue with making decisions about errors in the library is that the impact of errors is situational (e.g. a DNS lookup failure could be because the wrong URL has been specified, or because a remote solar powered device has just reached sufficient charge to startup up and the cell connection is not yet up).

I do think second one is a bit cleaner

I would vote for this option. As you say, in many cases the error callback will just be used to log errors so having a single callback simplifies users code (and with error types users can differentiate the errors should they need to).

currently it's pretty random, bad auth will return autopaho.ConnackError, wrong host will return fmt.wrapError

This kind of inconsistency tends to creep into open source software; originally everything was just a wrapped error; then a user submitted a PR to add ConnackError. I'd be happy to see a PR cleaning this up (but would ask that Unwrap() error be implemented so that code currently relying on errors.Is continues to run).

hspaay commented 11 months ago

Thank you for elaborating @MattBrittan and @XANi. The distinction between an initial connection and successive connection problems is a good one. If the intent is that OnError provides the hook to deal with an unauthorized case, then it should provide the proper information to do so. @XANi is bang on with his example. Now I understand the issue better I'll see if XANI's approach can solve it, assuming there is sufficient information in the error.

MattBrittan commented 11 months ago

assuming there is sufficient information in the error.

As per this PR you can use a type assertion to check if the error is a *ConnackError and, if it is, access the ReasonCode/test.