discord / discord-api-docs

Official Discord API Documentation
https://discord.com/developers/docs/intro
Other
5.9k stars 1.25k forks source link

Random misleading Unknown Interaction errors #5558

Open ImRodry opened 1 year ago

ImRodry commented 1 year ago

Description

I've seen this issue reported by many people but so far no one has been able to gather enough information to reliably explain what's going on. An example can be seen at https://github.com/discordjs/discord.js/issues/7005 In summary, every now and then at a seemingly random chance it's possible that a bot's reply to an interaction fails due to an Unknown Interaction when, in reality, the reply succeeded and was shown to the user (by reply I mean a regular reply, deferred reply or update). I know this because I've been investigating this issue on a bot I manage for around a week now and I asked some users who were impacted by this. In the following screenshots I'm logging the time it took for me to reply by subtracting the current timestamp to the interaction's created_timestamp, and then logging the time it took for the bot to receive the error by subtracting the timestamp at the time the error was received to the one before the request was submitted. You can see that the reply is sent pretty fast and in time for Discord to accept it, however, the error comes 5 seconds later, indicating some sort of issue on Discord's end. image And of course I could be faking those numbers but it would make no sense for me to do that so I'm gonna have to ask you to trust that. I later asked the user impacted by this issue to see what the bot responded with, and they showed that the reply was indeed deferred, which means that that error was a false positive and everything worked fine on our end.

Steps to Reproduce

There are no steps to consistently reproduce this issue as it only happens randomly. What I can tell is that the error comes when the API takes too long to send the response back but actually acknowledges and processes it.

Expected Behavior

The reply is sent correctly (happening) and a success message is returned

Current Behavior

The reply is sent correctly but an "Unknown Interaction" error is thrown

Screenshots/Videos

Can only attach what I've shown above already image image (Bot is thinking but in Portuguese)

Client and System Information

discord.js v14.6.0 on Node v18.11.0 running on Debian 11 (bullseye)

DV8FromTheWorld commented 1 year ago

Can you provide a code snippet showing how these logs were generated? I'm curious as to whether it is all coming from one request or possibly retries. There are a number of interacting systems here, so additional information to help debug the issue would be beneficial.

ImRodry commented 1 year ago

afaik discord.js only retries to submit requests when getting a 429 response which I assume not to be the case here on freshly created interactions, so there are no retries being done here to my knowledge image This is the first line that gets executed in the entire event that isn't an if statement and nothing above it interacts with the API other than this line. Hope this helps

ImRodry commented 1 year ago

Keep in mind this issue can happen with other kinds of interaction replies, not only deferred messages (I tested with showing a modal). I only showed that snippet because it's the most basic one that should never generate that error

ImRodry commented 1 year ago

Hey @DV8FromTheWorld do you have any updates on this?

ooliver1 commented 1 year ago

I have been getting this too, we check if it has been 3 seconds and it definitely has not at the time of request, but sometimes get this response. nextcord@2.2.0

kenyonbowers commented 1 year ago

Yeah, I have been getting this error as well. And I haven't changed my code since updating Discord.js to v14.6.0.

DV8FromTheWorld commented 1 year ago

I have not looked deeper into this issue at this time. This is the first time I've heard of this issue. Before assuming it is a problem with Discord I would likely investigate the underlying library implementation.

For debugging purposes: Is there a way in your library (or tech stack) to track outbound network traffic? If there is, it would be useful to indicating whether the library is re-attempting a network call or if the initial network call is actually taking 5 seconds. From your code snippet that isn't possible to determine.

ImRodry commented 1 year ago

I’m not sure if there is but I can dig into the source code and add that myself. I do, however, doubt that is the case, as we’ve seen @ooliver1 say they are experiencing the same behavior and they’re using a python library, which is completely different from the one I’m using

ooliver1 commented 1 year ago

It's pretty hard to reproduce confidently, since it's been random a lot of the time

kenyonbowers commented 1 year ago

I have found that letting it sit running for multiple hours after starting the bot allows it to not have that error until you turn the bot off and try to run it again without letting it sit.

ImRodry commented 1 year ago

@DV8FromTheWorld I believe there is not much more debugging I can do here. Due to this issue happening at a random chance and requiring a high volume of interactions it would be impossible to gather enough data to be able to tell exactly why it's happening. All I can tell is that, on discord.js, after calling deferReply() the request is sent to this method which I am not familiar with and I would probably need to spend a lot of time figuring out all the quirks with this class and the whole package itself. I would, however, like to emphasize that I've seen people face this issue long before discord.js had this rest package, and also other people on other languages and libraries claim to be facing the same so could you look into this? If needed I can start gathering timestamps of when this issue happens and send them to you if that helps, I just can't log anything from the internal parts of the library unfortunately

SuperSajuuk commented 1 year ago

If it helps, I also get totally random, out of nowhere, “unknown interaction” errors in my bot logs [i run a bot using the discord.py library, so totally unrelated to the op who uses discord.js] when sending a response to an interaction. In my case, its just an immediate ephemeral response message [eg interaction.response.send_message(mymsghere, ephemeral=True) ], rather than a deference response with a use of the followup webhook.

I’ve never bothered trying to work out why it happens since the error traceback shows its more likely to be a Discord issue, rather than to do with anyones’ library implementations [unless every single lib dev has implemented interactions wrongly for 2 years lol]. Also, it’ll happen once, then never again for several days, usually when i’m sleeping [ie overnight] so its hardly something i can spend time debugging, since there’s no chance i’ll be able to find out why its happening.

muhitrhn commented 1 year ago

I'm facing some issues with showing Modal in my bit. The same code works 99% times but in some cases the interaction returns Unknown Interaction when trying to show the Modal. When I replace Modal with a Reply to the interaction, it works everytime. But as soon as I revert back to using Modal it starts failing again. This happens in certain buttons interactions set through certain slash command data. The issue persists even if I repost that post. But if I try posting it again with same data, the error persists.

yash1441 commented 1 year ago

Have been getting random unknown interaction errors as well on deferReply() and showModal() rarely. Decided to check how much time each reply is taking (even though everything is deferred) using console.time() and console.timeEnd() and surprisingly that one day no errors occurred.

yonilerner commented 1 year ago

Would someone with this issue be willing to provide a complete, runnable code sample that reproduces this issue? Its very difficult to figure out if this is even a bug or not

ImRodry commented 1 year ago

@yonilerner like I've said above, simply set up an event listener that all it does is either reply or defer the interaction it receives. Let that sit for a couple hours with a good amount of interactions coming through and you should see the error. There's no reproducible code sample because it really is random

Conklins commented 1 year ago

event listener that all it does is either reply or defer the interaction it receives. Let that sit for a couple hours with a good amount of interactions coming through and you should see the error. There's no reproducible__

DV8FromTheWorld commented 1 year ago

The problem here is that there isn't enough information here to actually debug anything. I recognize that people are occasionally receiving "Unknown Interaction", but that usually indicates a problem with the developer's code.

Personally, I would try capturing a variety of information: 1) Capture network logs. i) Ensure there are no retries ii) Ensure the network request is actually being sent to discord, as opposed to being queued for # seconds due to some ratelimiting, and thus exceeding the timelimit

Unfortunately, until we have better concrete information with a timeline of events in a failed interaction request there isn't a ton we can do here.

ImRodry commented 1 year ago

Alright thank you, I will try to get that information for you. Unfortunately it might not be very easy since my bot is using a package and it's hard to get that info from the package itself on prod, but I'll look into it

ckohen commented 1 year ago

For what it's worth, with the increasing number of times we've seen this, I decided to finally look into a bit. In djs there shouldn't be anything getting in the way of the request firing, but I am implementing a separate request handler to handle specifically interaction callbacks. While in theory this won't change the external facing behavior of the request, it at least should streamline the process and make it a little easier to debug.

devsnek commented 1 year ago

Been a few months here so I'm assuming the behavior isn't being seen anymore.

ImRodry commented 1 year ago

Been a few months here so I'm assuming the behavior isn't being seen anymore.

oh no it definitely is, every single day, multiple times, I just don't have the time nor patience to debug things to the level you guys asked for

muhitrhn commented 1 year ago

Same here, has become a part of my life now.

SuperSajuuk commented 1 year ago

Been a few months here so I'm assuming the behavior isn't being seen anymore.

Still happens, even had it earlier today lol. Its just something I'm used to seeing at random now, and haven't really bothered to care about since there's no immediate impact to my bot. That being said, the source of what causes this problem needs to be resolved so people aren't confused by random misleading errors.

LunaUrsa commented 1 year ago

I'm excited to find this issue! This has been very annoying the past few weeks. Some of my findings:

The architecture of my bot commands:

Example 1, this works fine: 2023-03-14 15:55:25.697 [INFO] [interactionCreate] interactionCreate event started at 1678827325696 2023-03-14 15:55:25.698 [INFO] [interactionCreate] Decided to run slash command in 1ms 2023-03-14 15:55:25.699 [INFO] [commandRun] commandRun started at 1678827325698 2023-03-14 15:55:25.700 [INFO] [commandRun] Executed the command in 1ms 2023-03-14 15:55:25.700 [INFO] [d.botstats] Command started at 1678827325700 2023-03-14 15:55:25.702 [INFO] [d.botstats] Attempting to defer reply... 2023-03-14 15:55:25.918 [INFO] [d.botstats] Reply deferred in 218ms`

Example 2, happened right after example 1. Note how it takes < 10 milliseconds to get to the point where it tries to defer reply, and then fails 2023-03-14 15:55:26.979 [INFO] [interactionCreate] interactionCreate event started at 1678827326973 2023-03-14 15:55:26.980 [INFO] [interactionCreate] Decided to run slash command in 6ms 2023-03-14 15:55:26.981 [INFO] [commandRun] commandRun started at 1678827326980 2023-03-14 15:55:26.981 [INFO] [commandRun] Executed the command in 1ms 2023-03-14 15:55:26.982 [INFO] [d.botstats] Command started at 1678827326981 2023-03-14 15:55:26.984 [INFO] [d.botstats] Attempting to defer reply... 2023-03-14 15:55:31.401 [INFO] [commandRun] ERROR: DiscordAPIError[10062]: Unknown interaction at SequentialHandler.runRequest (/usr/src/app/node_modules/@discordjs/rest/src/lib/handlers/SequentialHandler.ts:498:11) at runMicrotasks () at processTicksAndRejections (node:internal/process/task_queues:96:5) at async SequentialHandler.queueRequest (/usr/src/app/node_modules/@discordjs/rest/src/lib/handlers/SequentialHandler.ts:198:11) at async REST.request (/usr/src/app/node_modules/@discordjs/rest/src/lib/REST.ts:343:20) at async ChatInputCommandInteraction.deferReply (/usr/src/app/node_modules/discord.js/src/structures/interfaces/InteractionResponses.js:69:5) at async Object.execute (/usr/src/app/src/discord/commands/guild/d.botstats.ts:26:5) at async commandRun (/usr/src/app/src/discord/utils/commandRun.ts:37:5)

yonilerner commented 1 year ago

I would try upgrading discord.js, there may be some bugfixes in newer versions that resolve this issue

LunaUrsa commented 1 year ago

Thanks for the suggestion!

Discord.js 14.8, the latest version, was released on Sunday, two days ago. I hoped it would help, so I upgraded quickly, but it still happens a few dozen times daily. To clarify: when 14.8 was released I updated all my packages and this did not resolve the problem.

I have considered moving down to 13.14 but have yet to do that as it would be a lot more work, and I've not heard any guarantee that version doesn't have this issue. =/

I would move down to 13.14 if it were a sure shot because this error is highly annoying to users. Mod commands sometimes don't work on the first try, so it makes the entire bot seem unstable.

ckohen commented 1 year ago

As others have mentioned, this issue is happening across multiple libraries. Both d.js 14.8 and 13.14 should handle this exactly the same. The only notable way to stall an interaction callback at the moment is to have hit the global ratelimit (which is technically an implementation issue that never got updated), and even if you did, that would clear in no more than 1 second. cc @yonilerner

For @LunaUrsa, there were no fixes made relating to this issue in 14.8, though it would've been ideal to land that PR I mentioned earlier for it. We ended up getting really conflicting responses from devs on how "ratelimiting" works on the /callback endpoint so it stalled the PR for a while. At this point I think we are finally ready to move forward with it, so it should land in the next release, but unless you are hitting the global ratelimit it shouldn't actually affect you.

LordOfPolls commented 1 year ago

In the interest of +1'ing this issue to highlight it is most definitely not library specific

I have encountered this in D.py, NAFF and interactions.py (rewrite and non). This is most assuredly an issue on discords side.

While I appreciate that it is an absolute nightmare to debug due to the infrequency and randomness of the error, it really shouldn't be brushed away as a library or network issue on the bot developers side.

The only reason this has little outcry is because it's infrequent, and our users just retry the command after it "fails", but obviously that's terrible ux

davfsa commented 1 year ago

My two cents to the conversation which I have tried to provide through other means to no luck, it seems like the underlying issue might be (educated guess) Discord taking too long to process the interactions at times, unsure of what that might be due to, as I cant debug there any further. The reason I say this because of logs I have from people using our library like the following one (note logs from an old version of the lib, I haven't contacted the person for new ones, but i have been told it keeps happening, rarely, but happening):

T 2023-01-13 01:52:39,600 hikari.gateway.2: dispatching INTERACTION_CREATE with seq 16296
T 2023-01-13 01:52:39,999 hikari.rest: f640d5af92e411ed85428e896c5c2a03 POST https://discord.com/api/v10/interactions/1063274348967886899/aW50ZXJhY3Rpb246MTA2MzI3NDM0ODk2Nzg4Njg5OTpZNXZtVHgwY25NYTI2bzF2VzlFcm9VbGhHYUF5Z1MxT2xYOGR1Y0MzRGx6WW85clNuSmp1Um1kYU01SlBWbHpWMnFIaVB1WG56bmtSbTFBNjY4VEs5TlpPTVV0cVk5ZTVkbzI4TmhYR0VaMkxkNW1nT2M3ZlFiWjBYdnZucjlOVA/callback
    User-Agent: DiscordBot (https://github.com/hikari-py/hikari, 2.0.0.dev115) Nekokatt AIOHTTP/3.8.1 CPython/3.10.9 Linux 64bit
 
    {'type': <ResponseType.MESSAGE_CREATE: 4>, 'data': {'embeds': [{'title': 'Stopwatch Started!', 'color': 11814356, 'footer': {'text': 'Note: stopwatch will stop after 1 day.'}}], 'allowed_mentions': {'parse': []}}}
T 2023-01-13 01:52:43,719 hikari.rest: f640d5af92e411ed85428e896c5c2a03 404 Not Found in 3719.914702931419ms
    Date: Fri, 13 Jan 2023 01:52:43 GMT
    Content-Type: application/json
    Transfer-Encoding: chunked
    Connection: keep-alive
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    Via: 1.1 google
    Alt-Svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
    CF-Cache-Status: DYNAMIC
    Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=0DHran7ZDY%2Fmf%2F1yZ6lS6PRfwaBRa6jcF61CffN1Re7AS91smDBuc1b12qftsF5I691eJ91iABum2CdkepDgU00BAmjPiD8DJJt57yxDtasX3tEsfVzspF6KHGJV"}],"group":"cf-nel","max_age":604800}
    NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
    X-Content-Type-Options: nosniff
    Set-Cookie: __cfruid=e11b5bb3826dc2e0d77c6aff187aec94fad6c899-1673574763; path=/; domain=.discord.com; HttpOnly; Secure; SameSite=None
    Server: cloudflare
    CF-RAY: 788a7e8009bb0e44-AMS
    Content-Encoding: gzip
 
    {"message": "Unknown interaction", "code": 10062}

Important things to note about the logs:

  1. The "dispatching" log is right after we receive the interaction (can also be checked by the time in the interaction ID: 1063274348967886899 => 2023-01-13, 01:52:39)
  2. The 3719.914702931419ms response time is round-trip time. This includes from making the request (after evaluation of bucket ratelimits, which are skipped for interactions anyways, so a NOOP) to receiving the response. The code can be found here
  3. A CF-RAY is provided in the response headers that could allow for further debugging, but these logs are months old and the info might not be stored anymore. I could try to ask for newer logs if it deemed necessary.
  4. This might also be due to random network delay, but I cant tell for sure unless the CF-RAY is looked at, as cloudflare should have all that info available. The average response time for this bot before and after these logs are around 500-700ms
LunaUrsa commented 1 year ago

Hey @devsnek could you please re-open this issue now that there's some activity and logs? I think this should be looked into, or at least kept open so others can find the issue

ImRodry commented 1 year ago

That definitely did not fix it, I’m still experiencing issues with this after updating

lumap commented 1 year ago

still getting those to this day, notably from people with a bad internet connection

davfsa commented 1 year ago

still getting those to this day, notably from people with a bad internet connection

Bad connection doesn't seem have any effect here all the time (please refer to my previous comment).

muhitrhn commented 1 year ago

This is happening more often now. Around 20+ times per day. My bot is in 41 Servers with a total of 127k Users. Mainly happens on showModal for me and sometimes on ApplicationCommand. In case of showModal even though the error is thrown, the modal still gets sent to the end user. But in case of ApplicationCommand it just straight erros the whole response.

JustRoxy commented 1 year ago

It's happening not only in discord.js, but in discord.net as well. The problem is inconsistently reproducible and it feels like discord servers just throttling some defer requests, and discarding them later with interaction timeout.

MockirY commented 1 year ago

Bu şimdi daha sık oluyor. Günde yaklaşık 20+ kez. Botum, toplam 127k Kullanıcı ile **41 Sunucuda . Esas olarak benim için showModal'da ve bazen de **ApplicationCommand'da oluyor . showModal durumunda, hata atılsa bile, modal yine de son kullanıcıya gönderilir. Ancak, ApplicationCommand durumunda, tüm yanıtta doğrudan hata yapar.****

Yea I try fix it I change my host service but it doesnt care...

HiroNxw commented 11 months ago

Is it fixed? I'm getting this error every time I click a button whiel I get a reply.

ozgur3512 commented 11 months ago

This just happens randomly and lasts 15-20 minutes then gone

RecycleFix commented 10 months ago

I had a testbot using disnake with around 10 test users - and had been running for a couple of months. Ran into this issue a week ago and still happened today. I then created a new bot, invited it to a guild/server, grabbed the new ID + token and placed into the code I had issues with. This worked without any issues. I'm using Buttons, Select Menu and Modals - and the error occurred when using the buttons.

AlecM33 commented 6 months ago

I can reliably reproduce this error for my bot. As another concluded in this thread, fundamentally it's not an issue with any specific library. Discord's API is giving you an HTTP 404 because it can't find the interaction for what is very likely a legitimate reason. In my case it comes down to Node.js processing capabilities and the interaction between two of my bot's commands, one deferred and one not.

I have a CPU-intensive command that constructs and attaches an image. At the high end this command can take 5 or 6 seconds. I defer this command, but due to suboptimal programming on my part the processing still blocks the event loop for the duration. If I send another one of my bot's commands--a performant one that is not deferred--Discord's API immediately starts the 3-second or so timer for my bot's reply to that one. If the event loop is blocked past that 3 second window, Discord cancels the interaction and thus it no longer exists. Some time after the event loop is unblocked and my bot attempts to reply to the non-existent interaction. HTTP 404 "unknown interaction".

Below is an example. I actually sent the RANDOM command a little after 16:20:18 GMT, but the code in my interaction handler didn't start executing until 16:20:22 GMT.

image

Personally I think for this issue to remain open, others need to provide more concrete data on why HTTP 404 is not warranted

Olzie-12 commented 5 months ago

Im getting the same with JDA as well... What weird is that the error is being thrown in my console. But the message did actually reach discord in time.

maxibue commented 4 months ago

Still getting the same error in d.js-14.14.1.

marcustyphoon commented 4 months ago

I see this multiple times a week on a deferred request using https://github.com/Snazzah/slash-create (yet another completely different library), on a bot that's in one server and gets ~5 requests/day. Mine's hosted on Cloudflare Workers; is it possible that some global rate limit is getting hit on a hosting provider basis? (I don't know how I would investigate that.)

milenakos commented 4 months ago

Happens to me regularly in both discord.py and nextcord. I have 70ms latency and usually respond to commands well within a second, so I think its an API issue.

dev-737 commented 4 months ago

This happens to me at random times (I use discord.js) when I try to defer replies, before which there is no other time consuming processing going on. I do have ~300ms of latency so I'm not sure if it's a spike causing it, but it happens far too often in a week.

real2two commented 4 months ago

+1 I sometimes get random Unknown Interaction errors using discord.js and discordeno as well sometimes. I've gotten to the point I use HTTP interactions as a workaround.

SemiMute commented 4 months ago

Getting this issue on latest versions of JDA and Discord.JS, seems like something wrong with discord itself not any single library.

marcustyphoon commented 3 months ago

I modified my code to repeatedly resend the failed deferred request with a 1000ms delay, and I just had 4 404 Unknown Webhook failures logged followed by a success (with no other activity in that time period, nor any other activity that day).

So at least in my case, the webhook and interaction are definitely (eventually) valid; it's not a matter of a URL being incorrect or a timeout (or a reasonable minimum time between requests being required to prevent a serverside race condition, which clients could plausibly implement; 4 seconds is way too long for that). That, to me, makes it sound very much like it is in fact a Discord-side problem (re: AlecM33's comment).

SomeBoringNerd commented 3 months ago

Using JDA, with commands that are otherwise instants (a /ping command), I get this error.

The weird part is that it's really random, but once it happens, it just wont go away. The command itself or my code does not seems to be a problem either since when it works, the interaction respond instantly. The device on which the code run is good, and the network is fast.

I was not able to identify a pattern, unfortunately, hope it get fixed soon

edit : the commands fail instantly, it does not wait 3 seconds, and when it happens, JDA dont get a SlashCommandInteractionEvent

edit (8th of june, 12:00) : removing the bot from a guild and re-adding it seems to have fixed it for now ? For the record, the interactions would fail no matter the guild, or if sent through dms, and was persistent accross restarts of the app

edit (10th of june, early morning) : correcting what I said two days ago, once the bug get triggered, it will only do so in existing guilds, adding it to a new guild will make commands work, but only in dms (for members of that guild) and the guild