Closed hloeung closed 1 year ago
Possible to run with debug and paste the output?
Sure, planning to try reproduce it when I have time and see if it's easily fixed. It may even be to do with the introduction of an additional IRC client/proxy (for me, it's irssi client -> irssi proxy -> matterircd. For @axinojolais, it's irssi client -> bip (?!?) -> matterircd).
FWIW, I've been running matterircd reverting commit 2c3883a69606e2c0e0d9cc3d22f06fae14d5050d and so far haven't noticed out-of-order messages. Anyways, will keep digging.
I am also facing this issue. My setup is hexchat <-> ZNC <-> matterircd. I do have debug enabled...will paste a "redacted" example below of the issue
ORIGINAL / Web-client
ONE TWO THREE FOUR FIVE SIX
MATTERIRCD
[1d4] ONE [1d5] TWO [1d6] THREE [1d7] FIVE [1d8] FOUR [1d9] SIX
Debug logs:
The issues seems to be that messages FOUR and FIVE got "glued" together, it shows as "FOUR\nFIVE" in the debug logs. Then some parsing split them, but throw them in reverse order.
I hope this is useful for the issue debugging! Cheers
So I guess this is just fallout from switching to using buffered channels?
With the fix switching back to unbuffered channels, u.eventChan = make(chan *bridge.Event)
or u.eventChan = make(chan *bridge.Event, 0)
?
IIUC, in handleWsActionPost() (https://github.com/42wim/matterircd/blob/master/bridge/mattermost/mattermost.go#L813), a single message with multiple lines is split into multiple messages and put into the buffered channel (https://github.com/42wim/matterircd/pull/383). Unfortunately, per https://stackoverflow.com/a/25795236, with buffered channels, we can't guarantee order of messages (the delivery?).
Correct me if I'm wrong.
Well, we have 1 goroutine handling them and the get put in in order, so that's why I don't understand it or I must be missing something.
@guilhermepiccoli logs above show the issue, a LIFO problem :-) I'm wondering why the FIVE and FOUR got glued together though.
Well, we have 1 goroutine handling them and the get put in in order, so that's why I don't understand it or I must be missing something
Oh, just edited my previous comment as you were replying :)
I checked and we do indeed have only the 1 goroutine.
Thanks @42wim and @hloeung ! Is there anything else I can provide in order to get us to a resolution? Unfortunately my time is really full the next 2 weeks, or else I'd try to fix myself, hopefully not too complex
@hloeung are you sure you were testing with the version that has #419 in it ? I talked to @guilhermepiccoli and he was using an older version that didn't had the fix.
I also can not reproduce it with current master
Yeah, definitely using a version with https://github.com/42wim/matterircd/pull/419 applied.
I wasn't able to easily reproduce it locally either. I'll have another go at reproducing it with debug/trace enabled this weekend.
An update here, still haven't come across this yet and wasn't able to reproduce it myself.
I did stumble on https://github.com/42wim/matterircd/issues/420 with a PR to fix that though so all is not in vein.
Ok, i'm going to close this, if someone can reproduce it, feel free to reopen.
Seen it today, with the latest matterircd. Sadly, while running without debug/trace logging.
@guilhermepiccoli did you end up sanitising the logs you had?
Hey @42wim and @hloeung ! Finally I was able to sanitize the logs I have, which shows the issue with 0.24.1, trace enabled - hope this is helpful.
Also, can we reopen the issue @42wim ? Saw this issue a lot past few days, but only have the above log collection with trace enabled. Have a bunch with just the debug logs though...
Cheers!
@guilhermepiccoli and closed again.. no idea why it happens, I've now removed buffered channels for mattermost/slack. master should fix it
Thanks a bunch @42wim , gonna try now (will run the artifact generated by the test build of this commit).
I'm running with the patch above, and so far, no out-of-order messages.
Thanks !
@guilhermepiccoli and closed again.. no idea why it happens, I've now removed buffered channels for mattermost/slack. master should fix it
I'll give it a try as well. I've been experiencing a lot of out of order messages lately.
As to why it got closed (if you were actually asking about it, if not, I'm sorry for providing explanation anyway) - it's because of the commit message. If you put in it keywords like Fixes #21312312
then when it's gets included in the main branch - it will automatically close the mentioned issue. Source.
Unfortunately, I'm still seeing the issues even with @42wim 's patch =/
Yeah, I'm also experiencing out of order messages. It's like if they get grouped by mattermost app into one then they appear in wrong order in matterircd. Probably enough to reproduce is to type 2 messages in quick succession on mattermost web or maybe inside the mattermost app. If they get grouped then they appear in IRC in reversed order.
In web UI:
On IRC:
I'm using latest release build
Adding to the above - the messages sent by this user were most likely sent using desktop mattermost app.
Can the people (@hloeung, @guilhermepiccoli, @furai ) that still have the issue download this special build: https://github.com/42wim/matterircd/pull/445/checks?check_run_id=2985395608 (and select your correct artifact)
https://github.com/42wim/matterircd/suites/3155567672/artifacts/72567237 should be a direct link to the 64-bit linux one.
Now add ootrace=true
(same way as trace=true
) in your matterircd.toml (so not under the mattermost header)
You don't need to run with full --debug
, the ootrace
option will show info messages with timestamps of every (multiline) message (and content) going to the eventchan and received from the eventchan on the irc side.
Hopefully this way we can see if there's a channel out of order issue or something else, if you get hit by the issue, paste the last OOTRACE lines here.
Per IRC, I haven't come across or noticed it since the latest patch. I've pulled in the latest with OOTRACE logging anyways. Will update if I notice.
Thanks!
Okay, here it is:
INFO[2021-07-07T08:20:07Z] OOTRACE: sending msg ${MSG1} to eventChan on 1625646007267419568: 0
INFO[2021-07-07T08:20:07Z] OOTRACE: sending msg ``` to eventChan on 1625646007267530504: 0
INFO[2021-07-07T08:20:07Z] OOTRACE: sending msg ${MSG2} to eventChan on 1625646007267579861: 0
INFO[2021-07-07T08:20:07Z] OOTRACE: got msg ``` [@@mpdc9qt9zi88jjqnuartbkc8ec] on 1625646007268263014 module=matterircd
INFO[2021-07-07T08:20:07Z] OOTRACE: got msg ${MSG1} [@@mpdc9qt9zi88jjqnuartbkc8ec] on 1625646007268319657 module=matterircd
INFO[2021-07-07T08:20:07Z] OOTRACE: sending msg ``` to eventChan on 1625646007272937234: 0
INFO[2021-07-07T08:20:07Z] OOTRACE: got msg ${MSG2} [@@mpdc9qt9zi88jjqnuartbkc8ec] on 1625646007273574391 module=matterircd
INFO[2021-07-07T08:20:07Z] OOTRACE: ${USER} sent ${MSG1}
```
${MSG2}
```
INFO[2021-07-07T08:20:07Z] OOTRACE: got msg ``` [@@mpdc9qt9zi88jjqnuartbkc8ec] on 1625646007275923369 module=matterircd
Another:
INFO[2021-07-07T05:31:43Z] OOTRACE: sending msg ${MSG1} to eventChan on 1625635903100315921: 0
INFO[2021-07-07T05:31:43Z] OOTRACE: sending msg ${MSG2} to eventChan on 1625635903100436639: 0
INFO[2021-07-07T05:31:43Z] OOTRACE: sending msg ${MSG3} to eventChan on 1625635903100489927: 0
INFO[2021-07-07T05:31:43Z] OOTRACE: got msg ${MSG2} [↪@@57z8xex4m78ijfuqke6yk6ha4e] on 1625635903101329460 module=matterircd
INFO[2021-07-07T05:31:43Z] OOTRACE: got msg ${MSG1} [↪@@57z8xex4m78ijfuqke6yk6ha4e] on 1625635903101347667 module=matterircd
INFO[2021-07-07T05:31:43Z] OOTRACE: ${USER} sent ${MSG1}
${MSG2}
${MSG3}
INFO[2021-07-07T05:31:43Z] OOTRACE: got msg ${MSG3} [↪@@57z8xex4m78ijfuqke6yk6ha4e] on 1625635903107285778 module=matterircd
@hloeung does MSG1 MSG2 or MSG3 contain ``` ? or a reaction?
I have no idea btw how this is possible, msg's get set on the channel in order but get read out of order.
@hloeung and other peeps, can you try https://github.com/42wim/matterircd/pull/445/checks?check_run_id=3041968404 ? https://github.com/42wim/matterircd/suites/3211116698/artifacts/74279412 direct linux 64-bit build.
The change here is to remove the buffered channel completely although this shouldn't make any difference because it's getting set to 0 when initializing mattermost/slack but maybe I'm missing something :)
If you're seeing OOTRACE: creating non-buffered eventchan on start-up
when connected to mattermost you're running the correct version.
Yep, upgraded yesterday and am running with the latest patch since then.
[ubuntu@irc-proxy mattermost]$ ./matterircd
INFO[2021-07-12T10:16:20Z] Running version 0.24.3-dev module=matterircd
INFO[2021-07-12T10:16:20Z] WARNING: THIS IS A DEVELOPMENT VERSION. Things may break. module=matterircd
INFO[2021-07-12T10:16:20Z] Listening on 127.0.0.1:6697 module=matterircd
INFO[2021-07-12T10:17:11Z] New connection: 127.0.0.1:44932 module=matterircd
INFO[2021-07-12T10:17:11Z] OOTRACE: creating non-buffered eventchan on start-up module=matterircd
slack-go/slack2021/07/12 10:17:11 websocket_managed_conn.go:181: Connecting to RTM
...
I can see a difference in the logs already. Seems to be got
, sending
, got
, sending
, got
, sending
vs. previously it's bunch of got
s followed by bunch of sending
s.
Will keep watching.
so fixed now? I'll do a release tomorrow then.
Yeah, I think so. Not noticed any more OOO messages.
Spoke too soon.
On IRC:
|20:30 <user1> ```bash [ui1pfumdkf8nmkwd8zwc46zqch]
|20:30 <user1> mount error 110 = Connection timed out [ui1pfumdkf8nmkwd8zwc46zqch]
|20:30 <user1> msg1 [ui1pfumdkf8nmkwd8zwc46zqch]
|20:30 <user1> ``` [ui1pfumdkf8nmkwd8zwc46zqch]
|20:30 <user1> $ sudo mount -t ceph :/testdir /mnt/cephfs -o name=bootstacktest [ui1pfumdkf8nmkwd8zwc46zqch]
Matterircd OOTRACE logs:
INFO[2021-07-20T20:30:13Z] OOTRACE: sending msg msg1 to eventChan on 1626813013329295391: 0
INFO[2021-07-20T20:30:13Z] OOTRACE: sending msg ```bash to eventChan on 1626813013329423442: 0
INFO[2021-07-20T20:30:13Z] OOTRACE: sending msg $ sudo mount -t ceph :/testdir /mnt/cephfs -o name=bootstacktest to eventChan on 1626813013329487053: 0
INFO[2021-07-20T20:30:13Z] OOTRACE: sending msg mount error 110 = Connection timed out to eventChan on 1626813013329544542: 0
INFO[2021-07-20T20:30:13Z] OOTRACE: got msg ```bash [@@ui1pfumdkf8nmkwd8zwc46zqch] on 1626813013331397999 module=matterircd
INFO[2021-07-20T20:30:13Z] OOTRACE: got msg msg1 [@@ui1pfumdkf8nmkwd8zwc46zqch] on 1626813013331004892 module=matterircd
INFO[2021-07-20T20:30:13Z] OOTRACE: got msg $ sudo mount -t ceph :/testdir /mnt/cephfs -o name=bootstacktest [@@ui1pfumdkf8nmkwd8zwc46zqch] on 1626813013331507745 module=matterircd
INFO[2021-07-20T20:30:13Z] OOTRACE: got msg mount error 110 = Connection timed out [@@ui1pfumdkf8nmkwd8zwc46zqch] on 1626813013341611377 module=matterircd
INFO[2021-07-20T20:30:13Z] OOTRACE: sending msg ``` to eventChan on 1626813013345213396: 0
INFO[2021-07-20T20:30:13Z] OOTRACE: user1 sent msg1
```bash
$ sudo mount -t ceph :/testdir /mnt/cephfs -o name=bootstacktest
mount error 110 = Connection timed out
```
INFO[2021-07-20T20:30:13Z] OOTRACE: got msg ``` [@@ui1pfumdkf8nmkwd8zwc46zqch] on 1626813013348128027 module=matterircd
So wtf?!?!?
Still happening, a lot less frequent, but still seeing it. Most recent being:
INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: sending msg 39 2 ftpuser to eventChan on 1628810281942359650: 0
INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: sent msg 39 2 ftpuser to eventChan on 1628810281948257769: 0
INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: sending msg 40 2 deploy to eventChan on 1628810281948595596: 0
INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: got msg 40 2 deploy [@@3sbkx9gjypgg5d1f7frf79xfxc] on 1628810281953273716 module=matterircd
INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: got msg 39 2 ftpuser [@@3sbkx9gjypgg5d1f7frf79xfxc] on 1628810281954677351 module=matterircd
WARN[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: Message out of order, got msg 39 2 ftpuser [@@3sbkx9gjypgg5d1f7frf79xfxc] but seen 40 module=matterircd
INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: sent msg 40 2 deploy to eventChan on 1628810281955665837: 0 INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: sending msg 41 2 chia to eventChan on 1628810281955869657: 0
INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: sent msg 41 2 chia to eventChan on 1628810281955993984: 0 INFO[2021-08-12T23:18:01Z] OOTRACE: #town-square: 3sbkx9gjypgg5d1f7frf79xfxc: sending msg 42 2 centos to eventChan on 1628810281956149911: 0
Weird, I'm at a loss at where this would be.
Still no fix for this?
@furai well, if we don't know the root cause we can't fix it :(
So anyone still interested, please try this PR :)
Hi,
Thought it was fixed with https://github.com/42wim/matterircd/pull/419 but it looks like it's still happening (as @axinojolais mentioned a few days ago):
With that, you can see that messages are received out of order below:
Even without code blocks, for example:
Produces this: