discord / discord-api-docs

Official Discord API Documentation
https://discord.com/developers/docs/intro
Other
5.9k stars 1.25k forks source link

Cloudflare blocking/captcha'ing API requests #1149

Closed WaffleThief123 closed 4 years ago

WaffleThief123 commented 4 years ago

I'm finding that Cloudflare is 403'ing API Calls, on about 85 percent of the servers we are requesting against for APIv4 audio traffic.

Discovered last night after reading into reports from the userbase of our bot.

Below is the exception that we get in Lavalink, which is what we use for audio broadcast.

java.lang.IllegalStateException: Failed to connect to wss://sydney49.discord.media/?v=4
        at space.npstr.magma.impl.connections.hax.ClosingUndertowWebSocketClient$1.handleFailed(ClosingUndertowWebSocketClient.java:73) ~[impl-0.12.3.jar!/:na]
        at org.xnio.IoFuture$HandlingNotifier.notify(IoFuture.java:215) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.AbstractIoFuture$1.run(AbstractIoFuture.java:211) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.IoUtils$2.execute(IoUtils.java:70) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.AbstractIoFuture.runNotifier(AbstractIoFuture.java:354) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.AbstractIoFuture.runAllNotifiers(AbstractIoFuture.java:233) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.AbstractIoFuture.setException(AbstractIoFuture.java:251) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.FutureResult.setException(FutureResult.java:89) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at io.undertow.websockets.client.WebSocketClient$ConnectionBuilder$2.notify(WebSocketClient.java:342) ~[undertow-core-2.0.26.Final.jar!/:2.0.26.Final]
        at org.xnio.AbstractIoFuture$1.run(AbstractIoFuture.java:211) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.IoUtils$2.execute(IoUtils.java:70) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.AbstractIoFuture.runNotifier(AbstractIoFuture.java:354) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.AbstractIoFuture.runAllNotifiers(AbstractIoFuture.java:233) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.AbstractIoFuture.setException(AbstractIoFuture.java:251) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.FutureResult.setException(FutureResult.java:89) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.http.HttpUpgrade$HttpUpgradeState$StringWriteListener.handleEvent(HttpUpgrade.java:391) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.http.HttpUpgrade$HttpUpgradeState$StringWriteListener.handleEvent(HttpUpgrade.java:372) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.conduits.WriteReadyHandler$ChannelListenerHandler.writeReady(WriteReadyHandler.java:65) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at io.undertow.protocols.ssl.SslConduit$SslWriteReadyHandler.writeReady(SslConduit.java:1273) ~[undertow-core-2.0.26.Final.jar!/:2.0.26.Final]
        at io.undertow.protocols.ssl.SslConduit$3.run(SslConduit.java:275) ~[undertow-core-2.0.26.Final.jar!/:2.0.26.Final]
        at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:582) ~[xnio-nio-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.nio.WorkerThread.run(WorkerThread.java:466) ~[xnio-nio-3.3.8.Final.jar!/:3.3.8.Final]
Caused by: java.nio.channels.ClosedChannelException: null
        at io.undertow.protocols.ssl.SslConduit.write(SslConduit.java:369) ~[undertow-core-2.0.26.Final.jar!/:2.0.26.Final]
        at org.xnio.conduits.ConduitStreamSinkChannel.write(ConduitStreamSinkChannel.java:150) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        at org.xnio.http.HttpUpgrade$HttpUpgradeState$StringWriteListener.handleEvent(HttpUpgrade.java:385) ~[xnio-api-3.3.8.Final.jar!/:3.3.8.Final]
        ... 7 common frames omitted
Zoddo commented 4 years ago

I confirm this issue too.

Maybe it's related to the number of connections we are making to the voice servers? (in our case, it's a 300k guilds bot, with an average of 1k voice connections - 1/3 of them are turning over quickly)

owenselles commented 4 years ago

I have a 1k guilds bot with 15 voice conncections and can confirm this issue as well. From what i heard all bots have this issue atm

aikaterna commented 4 years ago

Can confirm my bot is doing the same with various regions throwing the error. Was EU-Central and EU-West at first a couple days ago, but now there seems to be no rhyme or reason to it. Have been trying different voice nodes in different regions with the same result.

Senither commented 4 years ago

Can confirm as-well, hosting a bot with 50k~ish servers and 3 Lavalink nodes, each in their own region, and they're all throwning the same error with no real pattern between them, basically every Discord voice region is failing to connect at some point.

WaffleThief123 commented 4 years ago

@Senither is it super sporadic or is it almost every voice connect?

odddellarobbia commented 4 years ago

Just want to Confirm this on a bot I run called Kashima ( https://kashima.moe ) The bot is in over 61,000 guilds/servers. Unfortunately I can only get a connection like 5% of the time now. This is hurting our bot and it's users. Hoping this gets resolved soon.

WaffleThief123 commented 4 years ago

If it was rate limiting, we wouldn't get 403's when curl-ing the Voice gateway API. @TadaDeveloper

ShyPianoGuy commented 4 years ago

Can confirm this for a private bot only on 2 servers

WaffleThief123 commented 4 years ago

I was getting 403's earlier, now i'm getting proper 404's, unless i'm totally screwing my curl up.

curl -u <botkey-removed> --request GET sydney49.discord.media/?v=4/hello -v
Enter host password for user '<botkey-removed>':
*   Trying 162.159.128.235:80...
* TCP_NODELAY set
* Connected to sydney49.discord.media (162.159.128.235) port 80 (#0)
* Server auth using Basic with user '<botkey-removed>'
> POST /?v=4/hello HTTP/1.1
> Host: sydney49.discord.media
> Authorization: Basic N2o0eDNaZFNsSHlJa0V3eUtaX2dqdTJRY25OUW0zbFg6
> User-Agent: curl/7.66.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Date: Thu, 17 Oct 2019 17:53:02 GMT
< Content-Length: 0
< Connection: keep-alive
* Added cookie __cfduid="d27965c8c19277b9a1abd79a54220ca881571334782" for domain discord.media, path /, expire 1602870782
< Set-Cookie: __cfduid=d27965c8c19277b9a1abd79a54220ca881571334782; expires=Fri, 16-Oct-20 17:53:02 GMT; path=/; domain=.discord.media; HttpOnly
< CF-Cache-Status: DYNAMIC
< Server: cloudflare
< CF-RAY: 527421339fd0e3c6-ATL
< 
* Connection #0 to host sydney49.discord.media left intact
Zoddo commented 4 years ago

@TadaDeveloper If it was a ratelimit, it would return a 429, not a 403 (which is basically a cloudflare ban in this case).

@wertyy102 You should expect a 404 when curl'ing the gateway endpoint. It's meant for websocket connections only (with a proper Upgrade header), not plain HTTP requests. The main gateway (gateway.discord.gg) is also returning a 404 if you do a curl on it. Also, you removed your token from the command itself, but not from the displayed Authorization header (which is invalid here, and not needed since we are identifying inside the websocket connection, not in the HTTP headers)

WaffleThief123 commented 4 years ago

@Zoddo TY! I'm still learning... I just spun up a new bot/dev thing to get an API key for testing. I work with @odddellarobbia on that project, I handle infra/backend things wayyyyy more than i touch the public side.

Senither commented 4 years ago

@Senither is it super sporadic or is it almost every voice connect?

@wertyy102 It's really random, it will connect at times, but most of the connections are failing, I'm currently seeing 26 servers playing music where the average before were around 170, although I assume YouTube is also part of why that's down, since the YouTube search provider have been disabled in the bot recently to prevent getting more of the 429 errors I get when requesting music from them.

WaffleThief123 commented 4 years ago

@wertyy102 It's really random, it will connect at times, but most of the connections are failing, I'm currently seeing 26 servers playing music where the average before were around 170, although I assume YouTube is also part of why that's down, since the YouTube search provider have been disabled in the bot recently to prevent getting more of the 429 errors I get when requesting music from them.

@Senither hmu on discord, i'll show you a nice little workaround for the youtube 429's
MrStealYaWaffles#7346

Kodehawa commented 4 years ago

I can confirm this issue. My 7 music nodes present the same issue, alongside my main server.

jhgg commented 4 years ago

We are aware of this issue and actively working with Cloudflare to resolve.

WaffleThief123 commented 4 years ago

@jhgg thank you for the update!

odddellarobbia commented 4 years ago

@jhgg do we have a timeframe for when you expect the issue to be resolved? I'd like to post an announcement for the users of my bot.

jhgg commented 4 years ago

No. We have been working with them for a day on it now. We are escalating the issue with them and hope for resolution some time soon. However we aren’t certain why these blocks are happening yet - given we specifically set it up to work this way. We think there is some higher level of DOS mitigation that is kicking in.

WaffleThief123 commented 4 years ago

@jhgg i've noticed that there isn't any update on https://status.discordapp.com/ Would it be possible to get something logged publicly there that we can redirect our end-users at to show for it? It would look better all around in my opinion if there was something other than green for today/yesterday.

jhgg commented 4 years ago

We do not plan to update status page for this - as the impact is specifically limited to large bots who exceed a certain threshold of concurrent requests to our service. However feel free to link to this issue.

WaffleThief123 commented 4 years ago

@jhgg alright will do

NineBallo commented 4 years ago

my small bot i'm making that hasn't been used in a week is also having this issue meaning that its not only large bots effected, or its a one off.

WaffleThief123 commented 4 years ago

@jhgg taking what @NineBallAYAYA and a couple other users in this thread have said, it's also smaller bots that don't really get near a request limit...

aschenkuttel commented 4 years ago

My bot is in 2 servers with around 40 members and we can confirm

Clasko commented 4 years ago

My bot is also in only 2 server with 100 members and effected by this issue.

jhgg commented 4 years ago

Thanks for the reports. We are still investigating. To reduce noise I will lock this thread until we have further updates.

jhgg commented 4 years ago

Our investigation of this issue continues, and we've isolated 2 specific issues.

We recently changed our voice servers to proxy TCP traffic through cloudflare to stop some variants of naive DoS attacks.

Correct client implementations should have handled this change seamlessly. We announced that we would do this change back in July in the API server: https://canary.discordapp.com/channels/81384788765712384/381871767846780928/601151474738790512 and the proper way of connecting to our voice infra has been documented for a while now: https://github.com/discordapp/discord-api-docs/commit/c6067414cd2f7b65bbdca773e8f5027f1f6cefc4

The first issue is due to an incorrect implementation of our voice API in discord.py - I had committed a fix for this in July: https://github.com/Rapptz/discord.py/commit/8fdcb4de3b6dfd63c00bdd73583a63274f278998 however - no version has been released since June. So, discord.py doesn't work, because it's implementation is wrong. This is probably the issue affecting small bots using discord.py. This should be resolved in the next discord.py release (1.2.4) - and upgrading the dependency to that version when released should resole the issue.

The second is some DoS protection measure triggering on larger bots, causing them their connections to fail, regardless of library. This is the issue we are currently investigating.

jhgg commented 4 years ago

We think the second issue is now resolved - based on our findings, the issue was not Cloudflare's fault. Specifically, as part of this change, we started supporting and allowing clients to negotiate a TLS v1.3 connection to our voice servers. This seems to break something deep down in Lavalink, and thus causes the connection handshake to hang and timeout.

In the interim, I've gone ahead and disabled TLS 1.3 - meaning connections will fall back to the old TLS 1.2 default.

The first issue is also resolved, following a release of v1.2.4 with the change for discord.py: https://pypi.org/project/discord.py/#history

CapitaoFNCB commented 4 years ago

I'm using Java 14 and getting the issue too. Private bot for 2 guilds only, around 100 users...