Issue sending user messages.

the-jolley-boy commented 1 year ago

Summary

I think there may be a bug either on discords end or discord.py. I recently had the server my bot is in upgraded and since I get heartbeat issues that I believe to originate there. I had the code running 0 issues for 3+ months until the change.

Reproduction Steps

I don't really have any reproduceable steps sadly, it is very niche. I have tried to reproduce the issue myself but cannot do it. I have tried multiple accounts, privacy settings for dm sending and all and cannot reproduce but the heartbeat failing has lead me to believe this is the cause.

I will attach the code in question.

I have added way too many prints to try to figure out the issue, I just added the prints so I need to wait for the error to flare up again in order to exactly pinpoint but typically it fails 1-3 times per day atm.

I will make sure to update with any more info.

Minimal Reproducible Code

if "#" in u:
    user = discord.utils.find(lambda m: str(m) == u, guild.members)

    print("Member element" + str(user))

    try:
        print("DM attempt")
        await user.send("**Please help us improve our support by giving feedback.** \nIt will take just 30 seconds. Survey expires after 24 hours.")
        print("DM sent")
    except Exception as e:
        print("User cannot be DM'd. (Support Tickets)")
        print(e)
        return

    def check(m):
        return user == m.author and isinstance(m.channel, discord.DMChannel)

    await initial_func(user)

else:
    print("ticket made when bot/survey was not setup. (Support Tickets)")

Expected Results

It is supposed to either send a dm or fail and just print to console. It also will print if the ticket was not registered in my database but that doesn't really have any chance of being the reason of the failures.

Actual Results

Sometimes it works and sometimes it doesn't. My hunch is it has something to do with some permission given by members with regards to sending a dm and for some reason that attempt gets hung up when it should except and fail, my thought is some change on discords end may not be compatible with how discord.py does it at the moment. I have tried to reproduce with different settings for privacy and none have given me anything reproduceable.

Intents

intents = discord.Intents().all()

System Information

discord.py 2.2.2

Checklist

[x] I have searched the open issues for duplicates.
[X] I have shown the entire traceback, if possible.
[X] I have removed my token from display, if visible.

Additional Context

No response

AbstractUmbra commented 1 year ago

In the HTTP Request code dpy uses, you can see that any 2xx response code is valid and works, allowing it to continue.

You can also see that we handle 5xx error codes with a exponential backoff retry (scaling the time between retries up). The 4xx error codes are raised properly.

If your hunch is correct, then it would be because of a 5xx error code, in which we retry it with a very small delay (1, 2, 4, 8, 16 seconds). If it's 4xx (an error), we return that error.

Your try/except here is catching a bare exception and then will print merely the error, not the traceback which can hamper debugging, please alleviate that and send any traceback you do get here.

If there were wider reports of the way that DMing users with errors happen, we'd hear about it a lot more than a 1-3 times per day incident which makes me hesitant to believe this is the fault and I suspect the issue exists in your code.

Without seeing more real code we can't advise too much further. If you want to test more, you can try running with debug logging for a few hours until one of these issues hits, and send the full log to a paste service and then to here (or directly upload here, your choice).

More than happy to help with sufficient evidence to go on.

AbstractUmbra commented 1 year ago

To be honest the code here is a little wild.. why are you storing user#discrim in the database? You should instead store their IDs as the postgresql BIGINT type. This means when you fetch that data back you can use Guild.get_member() which is much faster than utils.find().

Your logging.info("Opened database successfully") is misleading as this logs before any connection is even attempted. If you relied on this it would lead you on a wild goose chase.

You have a check function that appears unused...?

As for the User.send, I'll need a traceback to advise further!

the-jolley-boy commented 1 year ago

Yeah some of the code I just copied from old function and just haven't gotten rid of it. I was also just looking to change to guild.get_member() actually instead. ill send the updated function. As for the id in the db it was originally like that to see the name of the member that did the survey but of course I can easily get that. the db is already setup with the username so I don't think I will change that.

AbstractUmbra commented 1 year ago

Usernames and discrims can change, whereas if you store the ID this data is always getable or fetchable on demand FROM the id.

the-jolley-boy commented 1 year ago

I am aware of that yes

the-jolley-boy commented 1 year ago

So the bot failed again but there was no issue with regards to what was initially in question.

This issue is my code has been working flawlessly for years and this most recent iteration of the code was working without fail for over 3 months. How is it that it seemingly fails around the time my server introduces the new discord feature that they are rolling out.

I highly doubt my code is causing this issue and in fact it is some issue with discord.py and the new discord feature updates. There is no way that code working for month fails due to itself without some library failures not in my control.

To add it is seemingly dying for no reason at all, there is nothing in the audit log between restarting it and it failing again and no log of any of the function in the program being called. It has to be some issue related to discord and discord.py

AbstractUmbra commented 1 year ago

Which library version are you on currently? print(discord.__version__) within Python or python -m discord -v from your shell will tell you.

There was a crash bug semi-recently and was fixed with v2.2.1, if you exist before that then that could be the issue.

However I will note that most crash bugs do have a traceback involved so your logging/error solution seems to be suppressing things.

the-jolley-boy commented 1 year ago

I'm using discord.py v2.2.2-final but I had the issue before and I was using some older version then updated to see if that would fix it. There is a traceback when I force stop the bot when I see that the bot is offline, but it's overall useless and says nothing about the issue. I doubt the log is really suppressing anything. Typically this is the error I get.

In the CMD: `--- Logging error --- Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 175, in run f.result(10) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 447, in result raise TimeoutError() concurrent.futures._base.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging__init.py", line 1098, in emit msg = self.format(record) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging__init.py", line 942, in format return fmt.format(record) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging__init.py", line 678, in format record.message = record.getMessage() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 368, in getMessage msg = msg % self.args TypeError: not enough arguments for format string Call stack: File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\threading.py", line 966, in _bootstrap self._bootstrap_inner() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1009, in _bootstrap_inner self.run() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 186, in run _log.warning(msg, self.shard_id, total) Message: 'Shard ID %s heartbeat blocked for more than %s seconds.\nLoop thread traceback (most recent call last):\n File "C:\Users\Administrator\main.py", line 2336, in \n Client.run(token)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 860, in run\n asyncio.run(runner())\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\runners.py", line 44, in run\n return loop.run_until_complete(main)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 628, in run_until_complete\n self.run_forever()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 595, in run_forever\n self._run_once()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1881, in _run_once\n handle._run()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run\n self._context.run(self._callback, *self._args)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 849, in runner\n await self.start(token, reconnect=reconnect)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 778, in start\n await self.connect(reconnect=reconnect)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 659, in connect\n await self.ws.poll_event()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 623, in poll_event\n await self.received_message(msg.data)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 566, in received_message\n _log.info(\'Shard ID %s has successfully RESUMED session %s.\', self.shard_id, self.session_id)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init__.py", line 1468, in info\n self._log(INFO, msg, args, **kwargs)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init__.py", line 1615, in _log\n self.handle(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 1625, in handle\n self.callHandlers(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 1687, in callHandlers\n hdlr.handle(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 967, in handle\n self.emit(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init__.py", line 1101, in emit\n stream.write(msg + self.terminator)\n' Arguments: (None, 10) --- Logging error --- [2023-04-07 20:51:11] [ERROR ] discord.client: Attempting a reconnect in 1.89s Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 659, in connect await self.ws.poll_event() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 646, in poll_event raise ConnectionClosed(self.socket, shard_id=self.shard_id, code=code) from None discord.errors.ConnectionClosed: Shard ID None WebSocket closed with 1000 Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 175, in run f.result(10) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 447, in result raise TimeoutError() concurrent.futures._base.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging__init.py", line 1098, in emit msg = self.format(record) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging__init.py", line 942, in format return fmt.format(record) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging__init.py", line 678, in format record.message = record.getMessage() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 368, in getMessage msg = msg % self.args TypeError: not enough arguments for format string Call stack: File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\threading.py", line 966, in _bootstrap self._bootstrap_inner() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1009, in _bootstrap_inner self.run() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 186, in run _log.warning(msg, self.shard_id, total) Message: 'Shard ID %s heartbeat blocked for more than %s seconds.\nLoop thread traceback (most recent call last):\n File "C:\Users\Administrator\main.py", line 2336, in \n Client.run(token)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 860, in run\n asyncio.run(runner())\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\runners.py", line 44, in run\n return loop.run_until_complete(main)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 628, in run_until_complete\n self.run_forever()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 595, in run_forever\n self._run_once()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1881, in _run_once\n handle._run()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run\n self._context.run(self._callback, *self._args)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 849, in runner\n await self.start(token, reconnect=reconnect)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 778, in start\n await self.connect(reconnect=reconnect)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 659, in connect\n await self.ws.poll_event()\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 623, in poll_event\n await self.received_message(msg.data)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 566, in received_message\n _log.info(\'Shard ID %s has successfully RESUMED session %s.\', self.shard_id, self.session_id)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init__.py", line 1468, in info\n self._log(INFO, msg, args, **kwargs)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init__.py", line 1615, in _log\n self.handle(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 1625, in handle\n self.callHandlers(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 1687, in callHandlers\n hdlr.handle(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init.py", line 967, in handle\n self.emit(record)\n File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\logging\init__.py", line 1101, in emit\n stream.write(msg + self.terminator)\n' Arguments: (None, 10)`

In the Log: [ERROR] - Attempting a reconnect in 0.94s Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 659, in connect await self.ws.poll_event() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 646, in poll_event raise ConnectionClosed(self.socket, shard_id=self.shard_id, code=code) from None discord.errors.ConnectionClosed: Shard ID None WebSocket closed with 1000

AbstractUmbra commented 1 year ago

The error log there is interesting to me, this definitely is our heartbeat logging but it's... broken? You're blocking the bot's heartbeat, likely with blocking code, and that's generating useless error logs here. You should try and resolve this heartbeat issue - but it's not the issue here.

You're getting disconnected and reconnected from the websocket, which can indicate more blocking code, which actually may be the cause here.

You're free to setup debug logging for a time and send the output of that here.

Added note: that logging traceback shouldn't exist, it's expecting two arguments and getting two arguments. Weird.

the-jolley-boy commented 1 year ago

I am just confused on how my code could be blocking. I look at the server logs and the only function that could have been used during one period where it went offline was the on_message() which would have only triggered but not completed anything since there were no messages in the monitored channels.

I've also made sure my code is 100% non blocking all the libraries I use and such are setup with async.

I will setup debug logging and see but given I use much of the same code in other bots and never have issues it's just very odd that my code would have anything blocking. What debugger do you suggest? And will my current log get in the way?

This issue is just very very odd.

mikeshardmind commented 1 year ago

There are a couple of things here that don't match expectations verifiable in the code from discord.py here, I'm trying to look for any potential source of interference

some additional info may help diagnose this.

the async entry point (whether asyncio.run or anything else) along with anything you do with the loop policy
the output of python -m discord -v
the output of python -m pip freeze

the-jolley-boy commented 1 year ago

@mikeshardmind

The Client class is this:

`class Client(discord.Client): def init(self): super().init(intents = intents) self.synced = False asyncio.set_event_loop_policy( asyncio.WindowsSelectorEventLoopPolicy() )

async def on_ready(self):
    await self.wait_until_ready()
    if not self.synced:
        await client.sync()
        self.synced = True
    print(f'{self.user} has connected to Discord!')`

The issue with the bot happened before I the event loop policy but I needed that to make psycopg asynchronous.

2.

Python v3.10.0-final
discord.py v2.2.2-final
aiohttp v3.7.4.post0
system info: Windows 10 10.0.14393

Idk how much you wanted so ill just dump it all aiodns==3.0.0 aiohttp==3.7.4.post0 appdirs==1.4.4 APScheduler==3.9.1.post1 asgiref==3.4.1 async-generator==1.10 async-timeout==3.0.1 attrs==22.2.0 Automat==22.10.0 beautifulsoup4==4.10.0 brotlipy==0.7.0 bs4==0.0.1 cchardet==2.1.7 certifi==2021.10.8 cffi==1.15.1 chardet==4.0.0 charset-normalizer==2.0.7 chat-exporter==2.2.1 colorama==0.4.4 constantly==15.1.0 contourpy==1.0.6 cryptography==39.0.0 cssselect==1.1.0 cycler==0.11.0 discord.py==2.2.2 emoji==1.7.0 exceptiongroup==1.0.1 fake-useragent==0.1.11 ffmpeg==1.4 filelock==3.9.0 fonttools==4.38.0 grapheme==0.6.0 h11==0.14.0 hyperlink==21.0.0 idna==3.2 importlib-metadata==4.11.3 incremental==22.10.0 itemadapter==0.7.0 itemloaders==1.0.6 jmespath==1.0.1 kiwisolver==1.4.4 lxml==4.8.0 matplotlib==3.6.2 more-itertools==9.1.0 multidict==5.2.0 numpy==1.23.5 opensea==0.2.0 outcome==1.2.0 packaging==21.3 parse==1.19.0 parsel==1.7.0 Pillow==9.3.0 Protego==0.2.1 psycopg==3.1.8 psycopg2==2.9.3 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycares==4.3.0 pycparser==2.21 PyDispatcher==2.0.6 pyee==8.2.2 PyNaCl==1.5.0 pyOpenSSL==23.0.0 pyparsing==3.0.9 pyppeteer==1.0.2 pyquery==1.4.3 PySocks==1.7.1 python-dateutil==2.8.2 pytz==2022.1 pytz-deprecation-shim==0.1.0.post0 queuelib==1.6.2 requests==2.26.0 requests-file==1.5.1 requests-html==0.10.0 Scrapy==2.7.1 scrapy-rotating-proxies==0.6.2 scrapy-user-agents==0.1.1 Scrapy-UserAgents==0.0.1 selenium==4.6.0 service-identity==21.1.0 six==1.16.0 sniffio==1.3.0 sortedcontainers==2.4.0 soupsieve==2.2.1 tldextract==3.4.0 tls-client==0.1.8 tqdm==4.64.0 trio==0.22.0 trio-websocket==0.9.2 Twisted==22.10.0 twisted-iocpsupport==1.0.2 typing==3.7.4.3 typing_extensions==4.5.0 tzdata==2022.7 tzlocal==4.2 ua-parser==0.16.1 urllib3==1.26.7 user-agents==2.2.0 w3lib==1.22.0 websockets==10.3 wsproto==1.2.0 yarl==1.6.3 youtube-dl==2021.12.17 zipp==3.8.0 zope.interface==5.5.2

AbstractUmbra commented 1 year ago

Not to be a stickler, but can you paste your entire bot file, or upload it (both without the token, and only the token removed).

Aside from issues in which psycopg is not great, and your environment is polluted with a lot of packages there isn't enough to go on here.

Also: code blocks are in the following format:- ```py # code here ```

AbstractUmbra commented 1 year ago

First and foremost I must say the overall design of this code is... questionable at best. The amount of nested functions is a bit baffling and poor design.

Secondly, psycopg2 on Windows is notably bad. It's known for numerous event loop issues.

Thirdly I do see quite a lot of blocking code, things like csv.Writer() and the like. This could be contributing to the blocking going on.

I do not however see anything that would cause the library to fall over, so I expect there are other issues going on here. I've been running 2.2.2 for quite some time now and outside of user-code faults I've had nothing like this, and I'm in servers with all of the latest features. No reproduction possible.

Please change your logging level to debug until it dies again, this should show us what is going on.

mikeshardmind commented 1 year ago

Thanks for the follow-up with your environment info. That ruled out the ProactorEventLoop + psycopg issue, as well as a few known dependencies that have caused issues by modifying another package's namespace before, but introduces a few other questions that don't have clear answers. As it stands, there's still what appears to be an impossible log line as the complaint is about the number of arguments, while logging that it in fact has the correct number of arguments. other people on 2.2.2 have had this log emitted correctly.

the-jolley-boy commented 1 year ago

@mikeshardmind yeah I am setting my logger to debug and will reply here when it fails again. Thanks.

the-jolley-boy commented 1 year ago

It just went offline and it logged absolutely nothing, last time anything logged was when the bot started. Well I guess besides the message that was generated once I interrupted it in the CMD, but we already saw this error before.

[ERROR] - Attempting a reconnect in 0.43s Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\client.py", line 659, in connect await self.ws.poll_event() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\discord\gateway.py", line 646, in poll_event raise ConnectionClosed(self.socket, shard_id=self.shard_id, code=code) from None discord.errors.ConnectionClosed: Shard ID None WebSocket closed with 1000 08-Apr-23 01:07:35 - [Line:550 - Function:received_message In:gateway.py From:discord.gateway]

the-jolley-boy commented 1 year ago

If anyone is curious, it seems to have just been some issue with the cli running the script and it was causing the bot to fail which was very odd since I have ran scripts for months without an issue like this. Simple server restart seemed to fix the issue.

Rapptz / discord.py