facebookarchive / memory-analyzer

A memory analyzer for running python processes
MIT License
154 stars 14 forks source link

Thread 1 "python" received signal SIGSEGV, Segmentation fault after running memory_analyzer run 8958 --debug #17

Closed ankanch closed 4 years ago

ankanch commented 4 years ago

Hi there, am trying to debug my python application running with asyncio in a docker container.

After I set up memory-analyzer, I ran it with

 memory_analyzer run 8958 --debug

it caused with Segmentation fault, and my application exited.

command output:

root@ip-172-31-80-194:/home/ubuntu# memory_analyzer run 8958 --debug
Analyzing pid 8958
Setting up GDB for pid 8958
[New LWP 9407]
[New LWP 9408]
[New LWP 9886]
[New LWP 9888]
[New LWP 10501]
[New LWP 10502]
[New LWP 11325]
[New LWP 11326]
[New LWP 11837]
[New LWP 11838]
[New LWP 12426]
[New LWP 12427]
[New LWP 13034]
[New LWP 13035]
[New LWP 13547]
[New LWP 13548]
[New LWP 14137]
[New LWP 14138]
[New LWP 14664]
[New LWP 14665]

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
0x00007f213b7d17ef in ?? ()
+set directories /usr/local/lib/python3.6/dist-packages/memory_analyzer/
+py sys.path.append("/usr/local/lib/python3.6/dist-packages/memory_analyzer/")
+file_command /tmp/tmpn7hi64_2/rendered_template-8958.py.out

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
PyGILState_Ensure () at ../Python/pystate.c:866
Python Exception <class 'gdb.error'> The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(PyGILState_Ensure) will be abandoned.
When the function is done executing, GDB will silently stop.:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/memory_analyzer//gdb_commands.py", line 47, in <module>
    TEMPLATES_PATH=TEMPLATES_PATH, pid=pid
gdb.error: Error occurred in Python command: The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(PyGILState_Ensure) will be abandoned.
When the function is done executing, GDB will silently stop.
Gathering data...
ERROR: Analysis for 8958 returned no data!  Try rerunning with --debug
ERROR: No results to report

environments:

Docker version 19.03.12, build 48a66213fe
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
thatch commented 4 years ago

Do you get the same error if you run memory analyzer inside docker? The message from gdb about "warning: Target and debugger are in different PID namespaces" seems relevant.

ankanch commented 4 years ago

Do you get the same error if you run memory analyzer inside docker? The message from gdb about "warning: Target and debugger are in different PID namespaces" seems relevant.

I will try to run it in docker.

ankanch commented 4 years ago

Do you get the same error if you run memory analyzer inside docker? The message from gdb about "warning: Target and debugger are in different PID namespaces" seems relevant.

Hi, @thatch , I got following output after I trying to run it in docker:

root@0b4c5f3b21cd:/usr/src/wsp# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.5  0.2 252516 21412 ?        Ssl  08:03   0:00 python ./wsp.py
root        11  0.3  0.0   5748  3612 pts/0    Ss   08:03   0:00 bash
root        16  0.0  0.0   9388  3084 pts/0    R+   08:03   0:00 ps aux
root@0b4c5f3b21cd:/usr/src/wsp# memory_analyzer run --debug 1
Analyzing pid 1
Setting up GDB for pid 1
/usr/local/bin/python-dbg: No such file or directory.
ptrace: Operation not permitted.
+set directories /usr/local/lib/python3.7/site-packages/memory_analyzer/
+py sys.path.append("/usr/local/lib/python3.7/site-packages/memory_analyzer/")
++file_command /tmp/tmppzya5663/rendered_template-0.py.out
Python Exception <class 'gdb.error'> No symbol table is loaded.  Use the "file" command.:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/memory_analyzer//gdb_commands.py", line 47, in <module>
    TEMPLATES_PATH=TEMPLATES_PATH, pid=pid
gdb.error: Error occurred in Python command: No symbol table is loaded.  Use the "file" command.
Gathering data...
ERROR: Analysis for 1 returned no data!  Try rerunning with --debug
ERROR: No results to report

while in my docker file, I do installed gdb:

FROM python:3.7.6-slim-buster
...
RUN apt-get update
...
RUN apt-get install -y gdb python3-dbg procps
RUN pip install memory-analyzer
...
CMD [ "python", "./wsp.py" ]

Then I followed your FAQ (No symbol table is loaded. Use the "file" command). I ran memeory_analyzer with -e option, it outputs :

root@0b4c5f3b21cd:/usr/src/wsp# memory_analyzer run --debug 1 -e /usr/local/bin/python3.7
Analyzing pid 1
Setting up GDB for pid 1
ptrace: Operation not permitted.
+set directories /usr/local/lib/python3.7/site-packages/memory_analyzer/
+py sys.path.append("/usr/local/lib/python3.7/site-packages/memory_analyzer/")
++file_command /tmp/tmpdjyckh1a/rendered_template-0.py.out
Python Exception <class 'gdb.error'> No symbol "PyGILState_Ensure" in current context.:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/memory_analyzer//gdb_commands.py", line 47, in <module>
    TEMPLATES_PATH=TEMPLATES_PATH, pid=pid
gdb.error: Error occurred in Python command: No symbol "PyGILState_Ensure" in current context.
Gathering data...
ERROR: Analysis for 1 returned no data!  Try rerunning with --debug
ERROR: No results to report
ankanch commented 4 years ago

Well, I finally solved it with add those option to the docker run command:

--cap-add=SYS_PTRACE --security-opt seccomp=unconfined

I can get my result now. image

ankanch commented 4 years ago

And I have another problem, I want to view forwards and backwards references of the top 2 objects. I just can't find the image generated.

I ran with command below:

memory_analyzer run -s 2 --no-upload 1 -e /usr/local/bin/python3.7

And I get the output as follows:

Analyzing pid 1
Setting up GDB for pid 1
Gathering data...
Writing output to file memory_analyzer_out/memory_analyzer_snapshot-20200825084103
Initializing frontend...

+-------------------------------------------------------------------------+-------+--------------+-------------------------------------------------
| Object                                                                  | Count |         Size |                    References
+-------------------------------------------------------------------------+-------+--------------+-------------------------------------------------
| builtins.dict                                                           |  3419 |    1.68   MB | /usr/src/wsp/memory_analyzer_out/ref_1_dict.png
| builtins.str                                                            | 13576 |    1.52   MB |  /usr/src/wsp/memory_analyzer_out/ref_1_str.png
| builtins.type                                                           |   918 |  767.48   KB |
| builtins.function                                                       |  4588 |  645.19   KB |
| builtins.code                                                           |  4467 |  630.30   KB |
| builtins.list                                                           |   583 |  445.63   KB |
| builtins.tuple                                                          |  3569 |  259.27   KB |
| builtins.wrapper_descriptor                                             |  1219 |  104.76   KB |
| builtins.weakref                                                        |  1189 |  102.18   KB |
| builtins.set                                                            |   132 |  100.91   KB |
| builtins.builtin_function_or_method                                     |  1169 |   91.33   KB |
| builtins.method_descriptor                                              |   992 |   77.50   KB |
...

As shown in the References column, there are image paths, And I trying to access it with the path, I just found, there is no image in that path.

root@5a2f8886bd8e:/usr/src/wsp/memory_analyzer_out# ls
memory_analyzer_snapshot-20200825083448

Then where is the image generated?

I also found that it reads ..., and will upload them for you to phabricator for easy viewing, ... , I tried both with and without --no-upload flag, it just seems to be the same result.

thatch commented 4 years ago

@lisroach ?

lisroach commented 4 years ago

I am able to repro the error, looks like it is failing to write the file with a "No such file or directory" error. Looking into it

ankanch commented 4 years ago

Hi, again, after my program running for a while in production environment , I want to collect some data, then I run command below:

memory_analyzer run --debug 1 -e /usr/local/bin/python3.7

I got following output:

Analyzing pid 1
Setting up GDB for pid 1
[New LWP 8]
[New LWP 9]
[New LWP 10]
[New LWP 26]
[New LWP 27]
[New LWP 31]
[New LWP 32]
[New LWP 34]
[New LWP 35]
[New LWP 46]
[New LWP 47]
[New LWP 48]
[New LWP 49]
[New LWP 52]
[New LWP 53]
[New LWP 66]
[New LWP 67]
[New LWP 71]
[New LWP 72]
[New LWP 74]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f132f2a07ef in epoll_wait (epfd=3, events=events@entry=0x7f12fc4232a0, maxevents=4, timeout=timeout@entry=1475) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
+set directories /usr/local/lib/python3.7/site-packages/memory_analyzer/
+py sys.path.append("/usr/local/lib/python3.7/site-packages/memory_analyzer/")
++file_command /tmp/tmpja0bjyal/rendered_template-1.py.out
GIL $1
+++call (void) PyRun_SimpleString("with open('/tmp/tmpja0bjyal/rendered_template-1.py.out') as f: exec(f.read())")
+++call (void) PyGILState_Release($1)
[Inferior 1 (process 1) detached]
Gathering data...
Writing output to file memory_analyzer_out/memory_analyzer_snapshot-20200826024832
Initializing frontend...

+-------------------------------------------------------------------------+-------+--------------+
| Object                                                                  | Count |         Size |
+-------------------------------------------------------------------------+-------+--------------+
| builtins.list                                                           |  1731 |    2.02   MB |
| builtins.dict                                                           |  3787 |    1.82   MB |
| builtins.str                                                            | 14875 |    1.64   MB |
| builtins.type                                                           |   922 |  769.48   KB |
| builtins.function                                                       |  4648 |  653.62   KB |
| builtins.code                                                           |  4503 |  635.36   KB |
| builtins.set                                                            |   148 |  457.03   KB |
| builtins.tuple                                                          |  3907 |  284.10   KB |
| builtins.weakref                                                        |  1292 |  111.03   KB |
| builtins.wrapper_descriptor                                             |  1219 |  104.76   KB |
| builtins.builtin_function_or_method                                     |  1236 |   96.56   KB |
| builtins.method_descriptor                                              |   992 |   77.50   KB |
| builtins.int                                                            |  2653 |   77.34   KB |
| abc.ABCMeta                                                             |    72 |   70.34   KB |
| builtins.getset_descriptor                                              |   711 |   55.55   KB |
| builtins.coroutine                                                      |   278 |   36.92   KB |
| builtins.frozenset                                                      |    93 |   34.82   KB |
| builtins.member_descriptor                                              |   400 |   31.25   KB |
| collections.OrderedDict                                                 |    31 |   30.24   KB |
| collections.deque                                                       |    46 |   28.75   KB |
| enum.EnumMeta                                                           |    26 |   26.85   KB |
| re.Pattern                                                              |    73 |   23.93   KB |

I think the data is incomplete? Because real memory cost of my python application is 264MB.

image

However, as you can infer from the table above, the sum of all rows is far less than 264MB.

And if I run ps aux, I got the following outputs :

root@5a2f8886bd8e:/usr/src/wsp# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  2.0  3.1 1733032 248808 ?      Ssl  Aug25  25:42 python ./wsp.py
root        24  0.0  0.0      0     0 ?        Z    Aug25   0:00 [gdb] <defunct>
root        44  0.0  0.0      0     0 ?        Z    Aug25   0:00 [gdb] <defunct>
root        84  0.0  0.0      0     0 ?        Z    Aug25   0:00 [gdb] <defunct>
root       103  0.0  0.0      0     0 ?        Z    02:45   0:00 [gdb] <defunct>
root       113  0.0  0.0      0     0 ?        Z    02:46   0:00 [gdb] <defunct>
root       123  0.0  0.0      0     0 ?        Z    02:48   0:00 [gdb] <defunct>
root       125  0.0  0.0   5748  3712 pts/0    Ss   05:17   0:00 bash
root       138  0.0  0.0      0     0 pts/0    Z    05:17   0:00 [gdb] <defunct>
root       142  0.0  0.0   9388  3112 pts/0    R+   05:18   0:00 ps aux

So, how can I get correct stats? Should I just wait there is no one on my server, running analyzer again?

thatch commented 4 years ago

Certainly possible it is not finding all objects, but a more likely explanation is that resident also counts items from malloc and you may be using some native libs? Can you share anything about what this program does... There might be some hints there.

Your graph looks fairly typical of startup growth, what is your end goal with running memory analyzer? Was there a particular problem or limit that you have you mind?

ankanch commented 4 years ago

Certainly possible it is not finding all objects, but a more likely explanation is that resident also counts items from malloc and you may be using some native libs? Can you share anything about what this program does... There might be some hints there.

Your graph looks fairly typical of startup growth, what is your end goal with running memory analyzer? Was there a particular problem or limit that you have you mind?

Well, my application is a proxy, I'm using other libs like websockets and asyncio.

The graph I posted above was taken 16 hours after my application started up. According to my logs, there have been more than 3000 users connected through my proxy. The problem is after those users disconnected, the memory of my proxy seems not released (memory usage is keep growing every day). I suspect that my application has memory leaks. That's the reason why I use memory_analyzer, I'm trying to find out what data takes the most memory.

thatch commented 4 years ago

Re undercounting, memory-analyzer uses the gc module via pympler, and I would suppose that most of your usage is not gc-tracked. The script you're running doesn't appear to be open source, but I would suggest looking at your entire dep tree for extensions (.so on Linux) and start there.

ankanch commented 4 years ago

Re undercounting, memory-analyzer uses the gc module via pympler, and I would suppose that most of your usage is not gc-tracked. The script you're running doesn't appear to be open source, but I would suggest looking at your entire dep tree for extensions (.so on Linux) and start there.

Thanks for reminding me, I will check it out.

ankanch commented 4 years ago

Re undercounting, memory-analyzer uses the gc module via pympler, and I would suppose that most of your usage is not gc-tracked. The script you're running doesn't appear to be open source, but I would suggest looking at your entire dep tree for extensions (.so on Linux) and start there.

finally I found it is helpful after switching malloc to tcmalloc. Memory usage reduced by 44%.