errbotio / errbot

Errbot is a chatbot, a daemon that connects to your favorite chat service and bring your tools and some fun into the conversation.
http://errbot.io
GNU General Public License v3.0
3.12k stars 614 forks source link

added non-breaking space sanitization #1518

Closed hgnsshn closed 1 year ago

hgnsshn commented 3 years ago

I have encountered and issue bot not recognising arguments when non-breaking space (mostly via copy-paste)makes its way into the arguments. This seems to be working fine, but please adjust it accordingly if necessary, so the bot sanitizes the   character.

nzlosh commented 3 years ago

I've seen this happen on Slack, is this the case for this PR? When Slack transforms URLs into links they munge the spaces into non-break spaces. I think this is a bug with Slack.

The proposed change will be generalised across all backends. This should probably be a configurable option rather than hard coded. It's conceivable that people may want to pass in \xa0 as an command parameter and do not want it to be converted to a space by errbot.

hgnsshn commented 3 years ago

@nzlosh Yes, this has happened in slack, but I have tested it in multiple situations when copy/pasting. It did not look like slack was changing spaces into nbsp, since I when I pasted a usual space, the usual space got to them bot, and when I have specifically pasted a non-breaking space, it got to the bot as well. I am not really sure where I could add this as a configurable option, so I would appreciate your advice.

nzlosh commented 3 years ago

@a-varich Slack introduces nbsp with URL transformations. It mainly goes past unnoticed except when it starts to interfere with errbot's command parsing. To illustrate, here is the same slack command executed, once by typing the full command and once copying and pasting the previous command from the slack interface.

image

The logs reveal the following:

2021-06-16 10:26:08,488 DEBUG    slack_sdk.rtm.v2          on_message invoked: (message: {
    "type": "message",
    "user": "Uxxxxxxxx",
    "client_msg_id": "bcdf2a94-9833-44c2-89f7-fd222ae3af0d",
    "suppress_notification": false,
    "text": ".b echo <http:\/\/localhost\/> typed",
    "team": "Txxxxxxxx",
    "blocks": [{
        "type": "rich_text",
        "block_id": "Vw6on",
        "elements": [{
            "type": "rich_text_section",
            "elements": [{
                "type": "text",
                "text": ".b echo "
            }, {
                "type": "link",
                "url": "http:\/\/localhost\/"
            }, {
                "type": "text",
                "text": " typed"
            }]
        }]
    }],
    "source_team": "Txxxxxxxx",
    "user_team": "Txxxxxxxx",
    "channel": "Cxxxxxxxx",
    "event_ts": "1623831968.004200",
    "ts": "1623831968.004200"
})
2021-06-16 10:26:09,582 DEBUG    errbot.core               *** text = .b echo http://localhost/ typed

Notice the text field doesn't contain exactly what was entered by the user, the url is enclosed in < > with escaping and the elements field shows 3 distinct parts to the message. This is Slack munging user entered content. It's not a big deal because errbot uses the text field and handles the processing correctly.

What becomes a problem is when a user, copies and pastes the command from Slack and pastes it as a new command. This is where nbsp gets introduced (I'm guessing it's the UI that's messing things up). As you pointed out, Slack will pass this through as-is.

2021-06-16 10:26:25,090 DEBUG    slack_sdk.rtm.v2          on_message invoked: (message: {
    "type": "message",
    "user": "Uxxxxxxxx",
    "client_msg_id": "b875bba3-0bd4-4833-8943-1e7825c36db4",
    "suppress_notification": false,
    "text": ".b echo\u00a0<http:\/\/localhost\/>\u00a0typed copied from previously typed command",
    "team": "Txxxxxxxx",
    "blocks": [{
        "type": "rich_text",
        "block_id": "JW9Q",
        "elements": [{
            "type": "rich_text_section",
            "elements": [{
                "type": "text",
                "text": ".b echo\u00a0"
            }, {
                "type": "link",
                "url": "http:\/\/localhost\/"
            }, {
                "type": "text",
                "text": "\u00a0typed copied from previously typed command"
            }]
        }]
    }],
    "source_team": "Txxxxxxxx",
    "user_team": "Txxxxxxxx",
    "channel": "Cxxxxxxxx",
    "event_ts": "1623831984.004600",
    "ts": "1623831984.004600"
})

2021-06-16 10:26:26,295 DEBUG    errbot.core               *** text = .b echo http://localhost/ typed copied from previously typed command

Despite the log looking correct, it now contains nbsp and doesn't match de defined command.

To get an idea of how configuration is managed in errbot, take a look at https://github.com/errbotio/errbot/blob/47a667ba3c30735fc62f006c35d7da076b122cdb/errbot/bootstrap.py#L23 and pick a variable to see how it's used in the code base.

sijis commented 1 year ago

I'm closing this. I think the only way forward is to expose a config option for ARG_BOTCMD_CHARACTER_REPLACEMENTS