Closed laggron42 closed 4 years ago
Update: I loaded every cog one by one to locate the problem, and I found it: it's the downloader cog. Every time I load the downloader, segmentation fault (without stack smashing detected). This is so weird...
I'm getting segmentation fault with downloader as well I believe- clean install, attempted to !load audio which red tries to grab Lavalink.jar - gets to 99% and seg faults
ver: 3.1.3-3.1.4
What are each of your Python versions?
3.7.4 for me
I can see retke is using a raspberry pi, which version? Also which system/os are you using @z00t? This could be caused a stack overflow that only occurs on specific architecture(s), it's hard to know
I use a Pi 3B+ with latest rasbian.
Python 3.7.4. Got this segmentation error while i try to load 'audio'
[INFO] red.audio.manager: Downloading Lavalink.jar... Lavalink.jar: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 60.8M/61.1M [00:14<00:00, 3.09MB/s] Segmentation fault
Pi zero on raspbian stretch
I also use a RPI 3B+. Forgot to give an update on this tho, I fixed the issue by uninstalling the Leveler (issue with pymongo) and loading each cog one by one (using --no-cogs
). I used this to get a detailed traceback of the problem (pointing to pymongo, while I'm using JSON).
@dutchplayers @z00t Can you please try running Red like this, and let me know what you get.
You need gdb
installed - should just be sudo apt-get install gdb
.
You also need to get the path of your redbot executable. Since you're on raspbian, I'm guessing you have pyenv installed, so after activating your virtualenv if you have one, the path is given by pyenv which redbot
.
Then run Red like this, replacing PATH_TO_REDBOT_EXECUTABLE with the path you got above, and ARGS with whatever you usually pass to the redbot
command (instance name, etc):
gdb python
(gdb) run PATH_TO_REDBOT_EXECUTABLE ARGS
## wait for segfault ##
(gdb) backtrace
Thread 8 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb1fdf470 (LWP 1551)]
0xb6fdac70 in elf_machine_rel (skip_ifunc=0,
reloc_addr_arg=0xb56a70f4, version=0x50,
sym=0xb56913f4, reloc=0xb5692210, map=0x57f600)
at ../sysdeps/arm/dl-machine.h:382
382 ../sysdeps/arm/dl-machine.h: No such file or directory.
(gdb) backtrace
#0 0xb6fdac70 in elf_machine_rel (skip_ifunc=0,
reloc_addr_arg=0xb56a70f4, version=0x50,
sym=0xb56913f4, reloc=0xb5692210, map=0x57f600)
at ../sysdeps/arm/dl-machine.h:382
#1 elf_dynamic_do_Rel (skip_ifunc=0,
lazy=<optimized out>, nrelative=<optimized out>,
relsize=<optimized out>, reladdr=<optimized out>,
map=0x57f600) at do-rel.h:137
#2 _dl_relocate_object (scope=<optimized out>,
reloc_mode=reloc_mode@entry=0,
consider_profiling=<optimized out>,
consider_profiling@entry=0) at dl-reloc.c:259
#3 0xb6fe38d4 in dl_open_worker (a=0xb1fdeb8c)
at dl-open.c:435
#4 0xb6fdec04 in _dl_catch_error (objname=0xb1fdf97c,
objname@entry=0xb1fdeb84, errstring=0xb1fdec10,
errstring@entry=0xb1fdeb88, mallocedp=0xb1fdeb84,
mallocedp@entry=0xb1fdeb83, operate=0xb1fdeb88,
args=args@entry=0xb1fdeb8c) at dl-error.c:187
#5 0xb6fe3020 in _dl_open (
file=0xb6f8ecc8 "libgcc_s.so.1", mode=-2147483646,
caller_dlopen=0xb6f8d078 <pthread_cancel_init+52>,
nsid=nsid@entry=-2, argc=3, argv=0xbefff254,
env=0xbefff264) at dl-open.c:660
#6 0xb6ea059c in do_dlopen (ptr=0xb1fdedd0)
at dl-libc.c:87
#7 0xb6fdec04 in _dl_catch_error (objname=0xb1fdf97c,
objname@entry=0xb1fdedb8, errstring=0x0,
errstring@entry=0xb1fdedbc, mallocedp=0xb1fdedc0,
mallocedp@entry=0xb1fdedb7, operate=0xb1fdedc4,
args=args@entry=0xb1fdedd0) at dl-error.c:187
---Type <return> to continue, or q <return> to quit---
#8 0xb6ea067c in dlerror_run (
operate=<optimized out>,
args=args@entry=0xb1fdedd0) at dl-libc.c:46
#9 0xb6ea06fc in __GI___libc_dlopen_mode (
name=<optimized out>, mode=mode@entry=-2147483646)
at dl-libc.c:163
#10 0xb6f8d078 in pthread_cancel_init ()
at ../sysdeps/nptl/unwind-forcedunwind.c:52
#11 0xb6f8d214 in _Unwind_ForcedUnwind (
exc=0xb6f8d214 <_Unwind_ForcedUnwind+88>,
stop=0x2a8d30, stop_argument=0xb1fdf6b8)
at ../sysdeps/nptl/unwind-forcedunwind.c:126
#12 0xb6f8abd8 in __GI___pthread_unwind (
buf=<optimized out>) at unwind.c:121
#13 0xb6f822c0 in __do_cancel () at pthreadP.h:283
#14 __pthread_exit (value=value@entry=0x0)
at pthread_exit.c:28
#15 0x0012b0c0 in PyThread_exit_thread ()
at Python/thread_pthread.h:270
#16 0x00170bb4 in t_bootstrap (
boot_raw=boot_raw@entry=0xb1fe2ef0)
at ./Modules/_threadmodule.c:1023
#17 0x0012af54 in pythread_wrapper (
arg=<optimized out>) at Python/thread_pthread.h:174
#18 0xb6f80fc4 in start_thread (arg=0xb1fdf470)
at pthread_create.c:458
#19 0xb6e68038 in ?? ()
at ../sysdeps/unix/sysv/linux/arm/clone.S:76
from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
Hmmm, unfortunately I don't think that helps much but it's probably due to one or both of the following:
It's possible that increasing the stack size will resolve the issue, but I don't know how that should be done in this circumstance. Whatever the cause is, it seems to be occurring in or around a secondary thread, which is most likely to be spawned by some dependency like aiohttp.
If it's possible, you could try upgrading to Raspbian Buster and installing Python 3 from the official repositories? Buster was released about a month ago, and the python3
package is Python 3.7.3. It's likely that the package was built to be optimised for ARM such that issues like this might be prevented.
I mean, realistically speaking the zero is going to struggle regardless- I didn't expect it to run full fledged; was more of a proof-of-concept to see if it would all compile/run.
For the sake of completion- I'll give buster and the official py3 repository a pull and report back!
Side note: I did manage to get V2 running with audio on the pi zero- though discord ended up timing out the stream after about half a second of playback xD
Yeah, V2 audio will use far fewer resources since it's just a few extra threads on the main bot (python) process. V3 audio is an entirely separate process running in the JVM, which at the very least is going to use a lot more memory. Even if you're able to limit the memory consumed by Lavalink and maybe increase the size of your swap disk, the quality is probably going to be awful on a Pi Zero.
On a 3B+ however, this shouldn't be as common. I personally run an Ubuntu Server 19.04 arm64 image on my 3B+ and it seems to run quite well. Raspbian Buster is also likely to be an improvement over Raspbian Stretch, though the official images are still 32-bit. It's up to you what you choose :)
Using sys.settrace
allowed me to find the source of the problem, gdb
didn't work well.
This is a bit more complicated to do but I'll still give the steps to do it :
nano
; CRTL + O to save, CRTL + X to exit). if you're using pyenv, it should be located at .pyenv/versions/redbot/lib/python3.7/site-packages/redbot/__main__.py
(note that my environnement is named redbot
, replace it with yours)add the following line at the beginning of the file
import asyncio
import json
import logging
import os
import sys
+ sys.settrace()
import discord
pip3 install --no-cache-dir --force-reinstall Red-DiscordBot
@retke In your case the issue might not necessarily be a "bug" in PyMongo, but it's likely that whatever is using PyMongo ran a big query and caused a buffer overflow.
It's also possible that it is indeed a bug in PyMongo on 32-bit systems, as sometimes the allocated buffer size is dependent on the bitness (e.g. BUFSIZE * sizeof(long)
). Either way, you would still need a somewhat large query to overflow this buffer. I'd be interested to see the source code of the leveler cog you're using.
There are also quite a few limitations with MongoDB itself on 32-bit systems, but they may be unrelated to this issue.
Aparently I'm having a similar issue. After I updated Red (3.1.2 -> 3.1.5) it keeps trying to redownload lavalink.jar even though it is still present. Also the download seems to fail everytime at 99% for me.
Manually placing this Lavalink in the folder does however stop the bot from trying to redownload the file. Below the debugoutput of red:
Aikaterna recommended me to comment my issue here.
OK, I've found an easy way to actually inspect what's happening, in a way which involves nothing but an environment variable. Simply set PYTHONFAULTHANDLER=1
when running Red and it'll give you a log of tracebacks when you get a segfault. Cheers to the faulthandler module.
I tried what you did (loading leveler still segfaults), and the output of your solution is very similar to mine (but way easier).
Loading the bot with --no-cogs
and then only the leveler works, but when I shutted down the bot, it did segfault (didn't know that behaviour). Here's the output.
Then I loaded the bot without any flag (it loaded everything, including leveler, it did segfault as usual). Here's the output.
There really doesn't seem to be anything useful or common amongst these different tracebacks unfortunately. The common denominators here are:
I'd be really interested to see if someone can cause this fault after upgrading to Raspbian Buster and running Red with the official python3
package in the Debian Buster repositories. I understand that Raspbian hasn't exactly made it easy to upgrade, but it may be worth the trouble.
Alright, tried with Raspbian Buster, still segfaults with leveler...
(redbot) laggron42@raspberrypi:~ $ redbot Red --dev
Connected to Discord. Getting ready...
Loading packages...
ujson module not found, using json
msgpack not installed, MsgPackSerializer unavailable
[2019-08-03 14:49:32] [INFO] red.ZeCogsV3.react_roles: Cached bindings: sélection-rôle: 2, welcome: 1
/home/laggron42/Red/cogs/CogManager/cogs/leveler/leveler.py: scipy is unable to import: No module named 'scipy'
Autocolor feature will be unavailable
*** stack smashing detected ***: <unknown> terminated
Aborted
(redbot) laggron42@raspberrypi:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
I've recently switched to buster, and segfaults issues haven't happened (so far) lavalink seems to work fine. The thing is, when I transferred the files from old bot to new bot, I deleted the audio folder in the cogs folder, then typed (p)load audio when the bot loaded, which seemed to have loaded the updated lavalink.
Idk if my problem was due to the same reasons, but I wanted to mention that after switching to buster I did not run into the problem again.. yet. However it is also to mention that I did not update the system but installed it from scratch (even on a new microsd).
After installing Redbot I transferred all my files from the old to the new bot, except for the audio folder just like @Ryan5374 above. Since the audio cog was still "loaded" the bot downloaded the lavalink.jar with the first start.
I can see from @retke's traceback that it occurred whilst using Python installed with pyenv instead of system Python. However apparently it's also occurring with system Python (discussed in Discord). There's not really much for me to say... I always plug distributions which build amd64 images, and recommend installing them on Raspberry Pi 3's... which ironically includes just about every major distribution except Raspbian. Raspbian is still compiled for 32-bit even though RPi 3's have 64-bit processors.
As I've mentioned previously, it's possible that the segfault @retke is getting with pymongo is actually a bug in PyMongo when compiled for 32-bit. It seems to be the only one which outputs "Stack Smashing Detected", which I guess makes it special. So perhaps the pre-built python in buster repositories stopped the segfaults everywhere but there?
I'll try reinstalling buster from scratch and see if it works.
I'm going to close this. It doesn't appear to be an issue on our end.
Other bugs
What were you trying to do?
Launch Red after updating.
What were you expecting to happen?
Have my bot alive
What actually happened?
Segmentation faults a bit everywhere.
When loading the bot with systemdaemon
When loading the bot from the terminal
When loading the bot with
--no-cogs
(works), then loading all cogs with commands (excluded leveler)After editing the core files a bit to print a message for each cog load, I got this (here wih
--no-cogs
and load with command):And here without
--no-cogs
How can we reproduce this issue?
¯_(ツ)_/¯ Those are memory errors related to C modules, that's specific to the hardware and also the cogs' dependencies. I can't find the source of the error, I get it in many different situations.