Open ImRodry opened 1 year ago
@marcustyphoon I still think there are probably legitimate explanations for the scenarios where this occurs, but I of course won't claim that for sure since everyone's situation is different. Since I had a 100% reliable way to reproduce the problem, I thought it would be useful to provide my explanation and more or less challenge those here, since so much of the info here is anecdotal and difficult to act on from discord's perspective. It does sound like your scenario is simple and somewhat consistent, so perhaps you could provide a minimally reproducible code example. That would probably help this gain traction in the event a maintainer checks in on this.
Just personally, whenever I've run into this it's had a client-side explanation. In any case - I'm not just looking to dismiss people's troubles. Rather I hoped to facilitate since this has been open for some time.
I have been seeing this issue ever since i first implemented slash commands over a year ago but i have largely dismissed it as being caused by network lag and interactions that arrive too late, but now I'm convinced there is an actual issue going on and decided to investigate further, so here are my two cents.
My setup is as follows:
I have a website that receives interactions via webhook URL, hosted on a Hetzner vps located in Ashburn US, running nginx 1.25.4 with an upstream proxy to Node.js 22.4 which runs my own custom code.
Here is an example of the timings i observe multiple times per day:
Sample 1
interaction ID: 1260125566783193129
snowflake timestamp: 2024-07-09T06:49:07.122Z
timeline:
?
- received by nginx (logs at the end of the request)2024-07-09T06:49:07.173Z
- received by node, defer response sent back to nginx09/Jul/2024:06:49:07
- nginx logs request completed:
09/Jul/2024:06:49:07 +0000 client=35.196.132.85 host=redacted path=/ request=POST / HTTP/1.1 status=200 request_length=2031 bytes_sent=183 body_bytes_sent=20 user_agent=Discord-Interactions/1.0 (+https://discord.com) upstream_status=200 request_time=0.001 upstream_response_time=0.001 upstream_connect_time=0.000 upstream_header_time=0.0012024-07-09T06:49:07.173Z
- command code runs2024-07-09T06:49:07.729Z
- command code ends and response initiates2024-07-09T06:49:07.729Z
- node http request created (http.request())2024-07-09T06:49:07.729Z
- node http stream write started (request.write())2024-07-09T06:49:07.773Z
- node http stream write ended (request.end())2024-07-09T06:49:07.773Z
- node http request emitted finish
event2024-07-09T06:49:10.199Z
- node http emitted response
event2024-07-09T06:49:10.199Z
- node http response emitted end
event2024-07-09T06:49:10.202Z
- bot emitted error event status 404 "Unknown Webhook" "code: 10015"Sample 2
interaction ID: 1260117594833293343
snowflake timestamp: 2024-07-09T06:17:26.461Z
?
- received by nginx (logs at the end of the request)2024-07-09T06:17:26.484Z
- received by node, defer response sent back to nginx09/Jul/2024:06:17:26
- nginx logs request completed:
09/Jul/2024:06:17:26 +0000 client=35.237.4.214 host=redacted path=/ request=POST / HTTP/1.1 status=200 request_length=1951 bytes_sent=183 body_bytes_sent=20 user_agent=Discord-Interactions/1.0 (+https://discord.com) upstream_status=200 request_time=0.001 upstream_response_time=0.001 upstream_connect_time=0.000 upstream_header_time=0.0012024-07-09T06:17:26.484Z
- command code runs2024-07-09T06:17:26.485Z
- command code ends and response initiates2024-07-09T06:17:26.485Z
- node http request created (http.request())2024-07-09T06:17:26.485Z
- node http stream write started (request.write())2024-07-09T06:17:26.485Z
- node http stream write ended (request.end())2024-07-09T06:17:26.485Z
- node http request emitted finish
event2024-07-09T06:17:29.601Z
- node http emitted response
event2024-07-09T06:17:29.601Z
- node http response emitted end
event2024-07-09T06:17:29.603Z
- bot emitted error event status 404 "Unknown Webhook" "code: 10015"My conclusion:
Discord is somehow not acknowledging the defer from the interaction webhook response and then delaying the follow up request until it expires. I thought about the possibility that the follow up is sent too fast, before the response from nginx is received by discord, but it doesn't seem to be the case as the issue persists even when the command takes 500ms+ to run.
I hope this is useful in getting this resolved, i can provide more information and more tests if needed.
(edit: formatting + typos)
Using JDA, with commands that are otherwise instants (a /ping command), I get this error.
The weird part is that it's really random, but once it happens, it just wont go away. The command itself or my code does not seems to be a problem either since when it works, the interaction respond instantly. The device on which the code run is good, and the network is fast.
I was not able to identify a pattern, unfortunately, hope it get fixed soon
edit : the commands fail instantly, it does not wait 3 seconds, and when it happens, JDA dont get a SlashCommandInteractionEvent
edit (8th of june, 12:00) : removing the bot from a guild and re-adding it seems to have fixed it for now ? For the record, the interactions would fail no matter the guild, or if sent through dms, and was persistent accross restarts of the app
edit (10th of june, early morning) : correcting what I said two days ago, once the bug get triggered, it will only do so in existing guilds, adding it to a new guild will make commands work, but only in dms (for members of that guild) and the guild
update :
After more experimentation, it turned out to be a client side issue, for a reason I dont understand, reloading discord did fix it and I dont know why. Since this error seems to be generic, it wont be useful for everyone, but for people that get this error in the same way as I did, it could help
edit (11 of august) : some of the users reported the exact same issue to me, the issue does not seems to appear on mobile, only desktop
observation: you can solve the issue by deferring the response, this results in an unnecessary request (note: both defer and response happen in under a second, this is a workaround and not the intended use of defers)
observation: you can solve the issue by deferring the response, this results in an unnecessary request (note: both defer and response happen in under a second, this is a workaround and not the intended use of defers)
I've been experiencing this issue exclusively with commands that were already deferred.
update:
I was able to greatly reduce the number of errors by not responding to the webhook itself.
My setup now is as follows:
This solution is only applicable when receiving interactions via webhook, but it seems to work well for now.
Description
I've seen this issue reported by many people but so far no one has been able to gather enough information to reliably explain what's going on. An example can be seen at https://github.com/discordjs/discord.js/issues/7005 In summary, every now and then at a seemingly random chance it's possible that a bot's reply to an interaction fails due to an Unknown Interaction when, in reality, the reply succeeded and was shown to the user (by reply I mean a regular reply, deferred reply or update). I know this because I've been investigating this issue on a bot I manage for around a week now and I asked some users who were impacted by this. In the following screenshots I'm logging the time it took for me to reply by subtracting the current timestamp to the interaction's created_timestamp, and then logging the time it took for the bot to receive the error by subtracting the timestamp at the time the error was received to the one before the request was submitted. You can see that the reply is sent pretty fast and in time for Discord to accept it, however, the error comes 5 seconds later, indicating some sort of issue on Discord's end. And of course I could be faking those numbers but it would make no sense for me to do that so I'm gonna have to ask you to trust that. I later asked the user impacted by this issue to see what the bot responded with, and they showed that the reply was indeed deferred, which means that that error was a false positive and everything worked fine on our end.
Steps to Reproduce
There are no steps to consistently reproduce this issue as it only happens randomly. What I can tell is that the error comes when the API takes too long to send the response back but actually acknowledges and processes it.
Expected Behavior
The reply is sent correctly (happening) and a success message is returned
Current Behavior
The reply is sent correctly but an "Unknown Interaction" error is thrown
Screenshots/Videos
Can only attach what I've shown above already (Bot is thinking but in Portuguese)
Client and System Information
discord.js v14.6.0 on Node v18.11.0 running on Debian 11 (bullseye)