Cog-Creators / Red-DiscordBot

A multi-function Discord bot
https://docs.discord.red
GNU General Public License v3.0
4.72k stars 2.3k forks source link

Segmentation fault after update to 3.1.3 #2871

Closed laggron42 closed 4 years ago

laggron42 commented 5 years ago

Other bugs

What were you trying to do?

Launch Red after updating.

What were you expecting to happen?

Have my bot alive

What actually happened?

Segmentation faults a bit everywhere.

When loading the bot with systemdaemon unknown

When loading the bot from the terminal

(redbot) laggron42@raspberrypi:~ $ redbot Red --dev
Connected to Discord. Getting ready...
Loading packages...
/home/laggron42/Red/cogs/CogManager/cogs/leveler/leveler.py: scipy is unable to import: No module named 'scipy'
Autocolor feature will be unavailable
*** stack smashing detected ***: /home/laggron42/.pyenv/versions/3.7.2/envs/redbot/bin/python3.7 terminated
Segmentation fault
(redbot) laggron42@raspberrypi:~ $

When loading the bot with --no-cogs (works), then loading all cogs with commands (excluded leveler)

(redbot) laggron42@raspberrypi:~ $ redbot Red --dev --no-cogs --debug
[2019-07-14 11:24:45] [DEBUG] red.main: ====Basic Config====
[2019-07-14 11:24:45] [DEBUG] red.main: Data Path: /home/laggron42/Red
[2019-07-14 11:24:45] [DEBUG] red.main: Storage Type: JSON
Connected to Discord. Getting ready...

______         _           ______ _                       _  ______       _
| ___ \       | |          |  _  (_)                     | | | ___ \     | |
| |_/ /___  __| |  ______  | | | |_ ___  ___ ___  _ __ __| | | |_/ / ___ | |_
|    // _ \/ _` | |______| | | | | / __|/ __/ _ \| '__/ _` | | ___ \/ _ \| __|
| |\ \  __/ (_| |          | |/ /| \__ \ (_| (_) | | | (_| | | |_/ / (_) | |_
\_| \_\___|\__,_|          |___/ |_|___/\___\___/|_|  \__,_| \____/ \___/ \__|

┌──────────────────────────────────┐    ┌──────────────────┐
│Red#6421                          │    │X MongoDB         │
│Prefixes: !                       │    │√ Voice           │
│Language: en_EN                   │    │X Docs            │
│Red Bot Version: 3.1.3            │    │X Tests           │
│Discord.py Version: 1.2.3         │    └──────────────────┘
│Shards: 1                         │
│Servers: 16                       │
│Users: 18969                      │
│3 cogs with 40 commands           │
└──────────────────────────────────┘

Invite URL: Do you really need this?

ujson module not found, using json
msgpack not installed, MsgPackSerializer unavailable
[14/07/2019 11:25] DEBUG InstantCommands: Cog successfully loaded on the instance.
Segmentation fault

After editing the core files a bit to print a message for each cog load, I got this (here wih --no-cogs and load with command):

(redbot) laggron42@raspberrypi:~ $ redbot Red --dev --debug --no-cogs
[2019-07-14 11:45:17] [DEBUG] red.main: ====Basic Config====
[2019-07-14 11:45:17] [DEBUG] red.main: Data Path: /home/laggron42/Red
[2019-07-14 11:45:17] [DEBUG] red.main: Storage Type: JSON
Loading cog <redbot.core.core_commands.Core object at 0x7554f730>...
Loading cog <redbot.core.cog_manager.CogManagerUI object at 0x7554f8b0>...
Loading cog <redbot.core.dev_commands.Dev object at 0x7550a650>...
Connected to Discord. Getting ready...
Loading packages...
Loading cog <redbot.cogs.admin.admin.Admin object at 0x6fa57710>...
Loading cog <adventure.adventure.Adventure object at 0x6fa6aa10>...
Loading cog <advgoogle.advgoogle.AdvancedGoogle object at 0x6f996f90>...
Loading cog <redbot.cogs.alias.alias.Alias object at 0x6f9a4e10>...
Loading cog <anisearch.anisearch.AniSearch object at 0x70e97bb0>...
Loading cog <approver.approver.Approver object at 0x7176e6f0>...
Loading cog <redbot.cogs.audio.audio.Audio object at 0x71769f30>...
Loading cog <redbot.cogs.bank.bank.Bank object at 0x6f88aa30>...
Loading cog <birthdays.birthdays.Birthdays object at 0x6f898730>...
ujson module not found, using json
msgpack not installed, MsgPackSerializer unavailable
Loading cog <booru.booru.Booru object at 0x754cc910>...
Loading cog <calculator.calc.Calculator object at 0x6f788e70>...
Loading cog <channelmanager.channelmanager.ChannelManager object at 0x6f7a4630>...
Loading cog <chatchart.chatchart.Chatchart object at 0x6f7a4d10>...
Loading cog <redbot.cogs.cleanup.cleanup.Cleanup object at 0x6e262e50>...
Loading cog <crabrave.crabrave.CrabRave object at 0x6e20b1d0>...
Loading cog <redbot.cogs.customcom.customcom.CustomCommands object at 0x6f88ac30>...
Loading cog <redbot.cogs.economy.economy.Economy object at 0x6e0a04f0>...
Loading cog <errorlogs.errorlogs.ErrorLogs object at 0x6e036530>...
Loading cog <eventmaker.eventmaker.EventMaker object at 0x6e04f690>...
Loading cog <redbot.cogs.filter.filter.Filter object at 0x704c6210>...
Loading cog <redbot.cogs.general.general.General object at 0x704d2a50>...
Loading cog <redbot.cogs.image.image.Image object at 0x704dfcb0>...
Loading cog <imagemaker.imagemaker.ImageMaker object at 0x704dfe90>...
Loading cog <imgflip.imgflip.Imgflip object at 0x70490fb0>...
Loading cog <instantcmd.instantcmd.InstantCommands object at 0x7049dbd0>...
[14/07/2019 12:01] DEBUG InstantCommands: Cog successfully loaded on the instance.
Loading cog <insult.insult.Insult object at 0x704a9db0>...
Loading cog <lockdown.lockdown.Lockdown object at 0x7043f210>...
Loading cog <redbot.cogs.mod.mod.Mod object at 0x7043fad0>...
Loading cog <redbot.cogs.modlog.modlog.ModLog object at 0x6dca71f0>...
Segmentation fault

And here without --no-cogs

(redbot) laggron42@raspberrypi:~ $ redbot Red --dev --debug
[2019-07-14 11:45:17] [DEBUG] red.main: ====Basic Config====
[2019-07-14 11:45:17] [DEBUG] red.main: Data Path: /home/laggron42/Red
[2019-07-14 11:45:17] [DEBUG] red.main: Storage Type: JSON
Loading cog <redbot.core.core_commands.Core object at 0x7554f730>...
Loading cog <redbot.core.cog_manager.CogManagerUI object at 0x7554f8b0>...
Loading cog <redbot.core.dev_commands.Dev object at 0x7550a650>...
Connected to Discord. Getting ready...
Loading packages...
Loading cog <redbot.cogs.permissions.permissions.Permissions object at 0x717767f0>...
Loading cog <redbot.cogs.admin.admin.Admin object at 0x716b4790>...
Loading cog <adventure.adventure.Adventure object at 0x716c2db0>...
Loading cog <advgoogle.advgoogle.AdvancedGoogle object at 0x716661f0>...
Loading cog <redbot.cogs.alias.alias.Alias object at 0x7166b110>...
Loading cog <anisearch.anisearch.AniSearch object at 0x716778d0>...
Loading cog <redbot.cogs.bank.bank.Bank object at 0x7166bb70>...
Loading cog <birthdays.birthdays.Birthdays object at 0x7168a410>...
Loading cog <calculator.calc.Calculator object at 0x71690fb0>...
Loading cog <channelmanager.channelmanager.ChannelManager object at 0x71681b10>...
Loading cog <redbot.cogs.cleanup.cleanup.Cleanup object at 0x71501d70>...
Loading cog <redbot.cogs.customcom.customcom.CustomCommands object at 0x71510990>...
Loading cog <redbot.cogs.downloader.downloader.Downloader object at 0x71522f50>...
Loading cog <redbot.cogs.economy.economy.Economy object at 0x70cb5d90>...
Loading cog <redbot.cogs.general.general.General object at 0x70c541b0>...
Loading cog <redbot.cogs.image.image.Image object at 0x70c61670>...
Loading cog <lockdown.lockdown.Lockdown object at 0x70c67e90>...
Loading cog <redbot.cogs.mod.mod.Mod object at 0x70c725b0>...
Loading cog <redbot.cogs.modlog.modlog.ModLog object at 0x64586510>...
Loading cog <partygames.partygames.PartyGames object at 0x64586a50>...
Loading cog <penis.penis.Penis object at 0x6474c2b0>...
Loading cog <pokedex.pokedex.Pokedex object at 0x6474c930>...
Loading cog <polling.polling.Polling object at 0x646f4ff0>...
Loading cog <raffle.raffle.Raffle object at 0x64704550>...
Loading cog <react_roles.react_roles.ReactRoles object at 0x717dd2d0>...
Loading cog <reminder.reminder.Reminder object at 0x6471f7d0>...
Loading cog <retrosign.retrosign.Retrosign object at 0x71824570>...
Loading cog <rndstatus.rndstatus.RndStatus object at 0x6468f050>...
Loading cog <roleinvite.roleinvite.RoleInvite object at 0x6469e870>...
[14/07/2019 11:45] DEBUG RoleInvite: Cog successfully loaded on the instance.
Loading cog <rss.core.RSS object at 0x6460df50>...
Loading cog <say.say.Say object at 0x64512550>...
[14/07/2019 11:45] DEBUG Say: Cog successfully loaded on the instance.
Loading cog <redbot.cogs.streams.streams.Streams object at 0x6451c9b0>...
Loading cog <sysinfo.sysinfo.SysInfo object at 0x644e2c70>...
Loading cog <redbot.cogs.trivia.trivia.Trivia object at 0x644db310>...
Loading cog <warnsystem.warnsystem.WarnSystem object at 0x6446b250>...
[14/07/2019 11:45] DEBUG WarnSystem: Cog successfully loaded on the instance.
Loading cog <approver.approver.Approver object at 0x6441d9f0>...
Loading cog <welcomer.welcomer.Welcomer object at 0x6441de10>...
Loading cog <redbot.cogs.audio.audio.Audio object at 0x6442cd30>...
/home/laggron42/Red/cogs/CogManager/cogs/leveler/leveler.py: scipy is unable to import: No module named 'scipy'
Autocolor feature will be unavailable
*** stack smashing detected ***: /home/laggron42/.pyenv/versions/3.7.2/envs/redbot/bin/python3.7 terminated
Segmentation fault

How can we reproduce this issue?

¯_(ツ)_/¯ Those are memory errors related to C modules, that's specific to the hardware and also the cogs' dependencies. I can't find the source of the error, I get it in many different situations.

laggron42 commented 5 years ago

Update: I loaded every cog one by one to locate the problem, and I found it: it's the downloader cog. Every time I load the downloader, segmentation fault (without stack smashing detected). This is so weird...

z00t commented 5 years ago

I'm getting segmentation fault with downloader as well I believe- clean install, attempted to !load audio which red tries to grab Lavalink.jar - gets to 99% and seg faults

ver: 3.1.3-3.1.4

Tobotimus commented 5 years ago

What are each of your Python versions?

z00t commented 5 years ago

3.7.4 for me

Tobotimus commented 5 years ago

I can see retke is using a raspberry pi, which version? Also which system/os are you using @z00t? This could be caused a stack overflow that only occurs on specific architecture(s), it's hard to know

RuisSoftware commented 5 years ago

I use a Pi 3B+ with latest rasbian.

Python 3.7.4. Got this segmentation error while i try to load 'audio'

[INFO] red.audio.manager: Downloading Lavalink.jar... Lavalink.jar: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 60.8M/61.1M [00:14<00:00, 3.09MB/s] Segmentation fault

z00t commented 5 years ago

Pi zero on raspbian stretch

laggron42 commented 5 years ago

I also use a RPI 3B+. Forgot to give an update on this tho, I fixed the issue by uninstalling the Leveler (issue with pymongo) and loading each cog one by one (using --no-cogs). I used this to get a detailed traceback of the problem (pointing to pymongo, while I'm using JSON).

Tobotimus commented 5 years ago

@dutchplayers @z00t Can you please try running Red like this, and let me know what you get.

You need gdb installed - should just be sudo apt-get install gdb.

You also need to get the path of your redbot executable. Since you're on raspbian, I'm guessing you have pyenv installed, so after activating your virtualenv if you have one, the path is given by pyenv which redbot.

Then run Red like this, replacing PATH_TO_REDBOT_EXECUTABLE with the path you got above, and ARGS with whatever you usually pass to the redbot command (instance name, etc):

gdb python
(gdb) run PATH_TO_REDBOT_EXECUTABLE ARGS
## wait for segfault ##
(gdb) backtrace
z00t commented 5 years ago
Thread 8 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb1fdf470 (LWP 1551)]
0xb6fdac70 in elf_machine_rel (skip_ifunc=0, 
    reloc_addr_arg=0xb56a70f4, version=0x50, 
    sym=0xb56913f4, reloc=0xb5692210, map=0x57f600)
    at ../sysdeps/arm/dl-machine.h:382
382     ../sysdeps/arm/dl-machine.h: No such file or directory.
(gdb) backtrace
#0  0xb6fdac70 in elf_machine_rel (skip_ifunc=0, 
    reloc_addr_arg=0xb56a70f4, version=0x50, 
    sym=0xb56913f4, reloc=0xb5692210, map=0x57f600)
    at ../sysdeps/arm/dl-machine.h:382
#1  elf_dynamic_do_Rel (skip_ifunc=0, 
    lazy=<optimized out>, nrelative=<optimized out>, 
    relsize=<optimized out>, reladdr=<optimized out>, 
    map=0x57f600) at do-rel.h:137
#2  _dl_relocate_object (scope=<optimized out>, 
    reloc_mode=reloc_mode@entry=0, 
    consider_profiling=<optimized out>, 
    consider_profiling@entry=0) at dl-reloc.c:259
#3  0xb6fe38d4 in dl_open_worker (a=0xb1fdeb8c)
    at dl-open.c:435
#4  0xb6fdec04 in _dl_catch_error (objname=0xb1fdf97c, 
    objname@entry=0xb1fdeb84, errstring=0xb1fdec10, 
    errstring@entry=0xb1fdeb88, mallocedp=0xb1fdeb84, 
    mallocedp@entry=0xb1fdeb83, operate=0xb1fdeb88, 
    args=args@entry=0xb1fdeb8c) at dl-error.c:187
#5  0xb6fe3020 in _dl_open (
    file=0xb6f8ecc8 "libgcc_s.so.1", mode=-2147483646, 
    caller_dlopen=0xb6f8d078 <pthread_cancel_init+52>, 
    nsid=nsid@entry=-2, argc=3, argv=0xbefff254, 
    env=0xbefff264) at dl-open.c:660
#6  0xb6ea059c in do_dlopen (ptr=0xb1fdedd0)
    at dl-libc.c:87
#7  0xb6fdec04 in _dl_catch_error (objname=0xb1fdf97c, 
    objname@entry=0xb1fdedb8, errstring=0x0, 
    errstring@entry=0xb1fdedbc, mallocedp=0xb1fdedc0, 
    mallocedp@entry=0xb1fdedb7, operate=0xb1fdedc4, 
    args=args@entry=0xb1fdedd0) at dl-error.c:187
---Type <return> to continue, or q <return> to quit---
#8  0xb6ea067c in dlerror_run (
    operate=<optimized out>, 
    args=args@entry=0xb1fdedd0) at dl-libc.c:46
#9  0xb6ea06fc in __GI___libc_dlopen_mode (
    name=<optimized out>, mode=mode@entry=-2147483646)
    at dl-libc.c:163
#10 0xb6f8d078 in pthread_cancel_init ()
    at ../sysdeps/nptl/unwind-forcedunwind.c:52
#11 0xb6f8d214 in _Unwind_ForcedUnwind (
    exc=0xb6f8d214 <_Unwind_ForcedUnwind+88>, 
    stop=0x2a8d30, stop_argument=0xb1fdf6b8)
    at ../sysdeps/nptl/unwind-forcedunwind.c:126
#12 0xb6f8abd8 in __GI___pthread_unwind (
    buf=<optimized out>) at unwind.c:121
#13 0xb6f822c0 in __do_cancel () at pthreadP.h:283
#14 __pthread_exit (value=value@entry=0x0)
    at pthread_exit.c:28
#15 0x0012b0c0 in PyThread_exit_thread ()
    at Python/thread_pthread.h:270
#16 0x00170bb4 in t_bootstrap (
    boot_raw=boot_raw@entry=0xb1fe2ef0)
    at ./Modules/_threadmodule.c:1023
#17 0x0012af54 in pythread_wrapper (
    arg=<optimized out>) at Python/thread_pthread.h:174
#18 0xb6f80fc4 in start_thread (arg=0xb1fdf470)
    at pthread_create.c:458
#19 0xb6e68038 in ?? ()
    at ../sysdeps/unix/sysv/linux/arm/clone.S:76
   from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
Tobotimus commented 5 years ago

Hmmm, unfortunately I don't think that helps much but it's probably due to one or both of the following:

  1. Python having been compiled with pyenv, as opposed to using a Python image pre-built for the specific OS/architecture
  2. Low memory

It's possible that increasing the stack size will resolve the issue, but I don't know how that should be done in this circumstance. Whatever the cause is, it seems to be occurring in or around a secondary thread, which is most likely to be spawned by some dependency like aiohttp.

If it's possible, you could try upgrading to Raspbian Buster and installing Python 3 from the official repositories? Buster was released about a month ago, and the python3 package is Python 3.7.3. It's likely that the package was built to be optimised for ARM such that issues like this might be prevented.

z00t commented 5 years ago

I mean, realistically speaking the zero is going to struggle regardless- I didn't expect it to run full fledged; was more of a proof-of-concept to see if it would all compile/run.

For the sake of completion- I'll give buster and the official py3 repository a pull and report back!

Side note: I did manage to get V2 running with audio on the pi zero- though discord ended up timing out the stream after about half a second of playback xD

Tobotimus commented 5 years ago

Yeah, V2 audio will use far fewer resources since it's just a few extra threads on the main bot (python) process. V3 audio is an entirely separate process running in the JVM, which at the very least is going to use a lot more memory. Even if you're able to limit the memory consumed by Lavalink and maybe increase the size of your swap disk, the quality is probably going to be awful on a Pi Zero.

On a 3B+ however, this shouldn't be as common. I personally run an Ubuntu Server 19.04 arm64 image on my 3B+ and it seems to run quite well. Raspbian Buster is also likely to be an improvement over Raspbian Stretch, though the official images are still 32-bit. It's up to you what you choose :)

laggron42 commented 5 years ago

Using sys.settrace allowed me to find the source of the problem, gdb didn't work well. This is a bit more complicated to do but I'll still give the steps to do it :

Tobotimus commented 5 years ago

@retke In your case the issue might not necessarily be a "bug" in PyMongo, but it's likely that whatever is using PyMongo ran a big query and caused a buffer overflow.

It's also possible that it is indeed a bug in PyMongo on 32-bit systems, as sometimes the allocated buffer size is dependent on the bitness (e.g. BUFSIZE * sizeof(long)). Either way, you would still need a somewhat large query to overflow this buffer. I'd be interested to see the source code of the leveler cog you're using.

There are also quite a few limitations with MongoDB itself on 32-bit systems, but they may be unrelated to this issue.

laggron42 commented 5 years ago

I'm using Stevy's leveler, ported to V3 by Fixator10

MK99MA commented 5 years ago

Aparently I'm having a similar issue. After I updated Red (3.1.2 -> 3.1.5) it keeps trying to redownload lavalink.jar even though it is still present. Also the download seems to fail everytime at 99% for me. DownloadError

Manually placing this Lavalink in the folder does however stop the bot from trying to redownload the file. Below the debugoutput of red: Debug

Aikaterna recommended me to comment my issue here.

Tobotimus commented 5 years ago

OK, I've found an easy way to actually inspect what's happening, in a way which involves nothing but an environment variable. Simply set PYTHONFAULTHANDLER=1 when running Red and it'll give you a log of tracebacks when you get a segfault. Cheers to the faulthandler module.

laggron42 commented 5 years ago

I tried what you did (loading leveler still segfaults), and the output of your solution is very similar to mine (but way easier).

Tobotimus commented 5 years ago

There really doesn't seem to be anything useful or common amongst these different tracebacks unfortunately. The common denominators here are:

I'd be really interested to see if someone can cause this fault after upgrading to Raspbian Buster and running Red with the official python3 package in the Debian Buster repositories. I understand that Raspbian hasn't exactly made it easy to upgrade, but it may be worth the trouble.

laggron42 commented 5 years ago

Alright, tried with Raspbian Buster, still segfaults with leveler...

(redbot) laggron42@raspberrypi:~ $ redbot Red --dev
Connected to Discord. Getting ready...
Loading packages...
ujson module not found, using json
msgpack not installed, MsgPackSerializer unavailable
[2019-08-03 14:49:32] [INFO] red.ZeCogsV3.react_roles: Cached bindings: sélection-rôle: 2, welcome: 1
/home/laggron42/Red/cogs/CogManager/cogs/leveler/leveler.py: scipy is unable to import: No module named 'scipy'
Autocolor feature will be unavailable
*** stack smashing detected ***: <unknown> terminated
Aborted
(redbot) laggron42@raspberrypi:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:    10
Codename:   buster

Full output with PYTHONFAULTHANDLER=1

Ryan5374 commented 5 years ago

I've recently switched to buster, and segfaults issues haven't happened (so far) lavalink seems to work fine. The thing is, when I transferred the files from old bot to new bot, I deleted the audio folder in the cogs folder, then typed (p)load audio when the bot loaded, which seemed to have loaded the updated lavalink.

MK99MA commented 5 years ago

Idk if my problem was due to the same reasons, but I wanted to mention that after switching to buster I did not run into the problem again.. yet. However it is also to mention that I did not update the system but installed it from scratch (even on a new microsd).

After installing Redbot I transferred all my files from the old to the new bot, except for the audio folder just like @Ryan5374 above. Since the audio cog was still "loaded" the bot downloaded the lavalink.jar with the first start.

Tobotimus commented 5 years ago

I can see from @retke's traceback that it occurred whilst using Python installed with pyenv instead of system Python. However apparently it's also occurring with system Python (discussed in Discord). There's not really much for me to say... I always plug distributions which build amd64 images, and recommend installing them on Raspberry Pi 3's... which ironically includes just about every major distribution except Raspbian. Raspbian is still compiled for 32-bit even though RPi 3's have 64-bit processors.

Tobotimus commented 5 years ago

As I've mentioned previously, it's possible that the segfault @retke is getting with pymongo is actually a bug in PyMongo when compiled for 32-bit. It seems to be the only one which outputs "Stack Smashing Detected", which I guess makes it special. So perhaps the pre-built python in buster repositories stopped the segfaults everywhere but there?

laggron42 commented 5 years ago

I'll try reinstalling buster from scratch and see if it works.

mikeshardmind commented 4 years ago

I'm going to close this. It doesn't appear to be an issue on our end.