HushengGen / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

google profiler hangs in rhel5 and rhel6, but works in rhel4 #642

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Write a code which has system calls 
2.Run in any rhel5 or rhel6 platform
3.Hangs abruptly in any system call. The hanging happens more predominantly for 
little large testcases.

What is the expected output? What do you see instead?
Completion of the run, but it hangs in middle ..
stack 39372
Thread 4 (Thread 0x2b5992f95700 (LWP 39373)):
#0  0x00000038e260b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000023a122e in k34ba3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x2b5993196700 (LWP 39374)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000023a1142 in a3467n3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x2b5994972700 (LWP 39379)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000024f07c1 in ?? ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b5991b6fee0 (LWP 39372)):
#0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
#3  0x0000000000d16e92 in hdlSetCheetahArgs() ()
#4  0x0000000000d227e2 in hdlParserMain ()
#5  0x0000000000d9a92a in CHECKRULES(registeredNameList*, SgWorkspace*, int) ()
#6  0x0000000000d924dc in sgCompareNDesigns(registeredNameList*) ()
#7  0x0000000000d8455c in sgGoCheckRules ()
#8  0x0000000000a95543 in runGoal() ()

What version of the product are you using? On what operating system?

google profiler version 2.2.1 compiled on gcc 4.8.1 on 2.6.18-92.el5 platform. 
Please provide any additional information below.
The hang is more consistent in 2.6.18-92.el5 platform and works smoothly 
without hang in 2.6.9-5.EL platform.

Original issue reported on code.google.com by rajdeep....@gmail.com on 18 Aug 2014 at 12:34

GoogleCodeExporter commented 9 years ago
strace -cp PID shows..

strace -cp 39372
Process 39372 attached - interrupt to quit
^CProcess 39372 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    6.818870      162354        42        42 clone
  0.00    0.000048           0      1290           rt_sigprocmask
  0.00    0.000024           1        43           rt_sigreturn
  0.00    0.000010           0       215           msync
------ ----------- ----------- --------- --------- ----------------
100.00    6.818952                  1590        42 total

Original comment by rajdeep....@gmail.com on 18 Aug 2014 at 12:35

GoogleCodeExporter commented 9 years ago
In this backtraces I on only see something waiting on result of system call. 
And no signs of anything stuck in gperftools.

There's still slight chance that child process forked by system is stuck 
somewhere. So getting backtraces from child might help.

Also helpful might be short reproduction program that you could share. And also 
helpful might be if you could test if older versions (2.0 and 2.1) are affected 
or not.

Original comment by alkondratenko on 18 Aug 2014 at 3:55

GoogleCodeExporter commented 9 years ago
The pstree for similar process shows the following:
pstree -p 61012
check.Linux4(61012)???{check.Linux4}(61014)
                    ??{check.Linux4}(61015)
                    ??{check.Linux4}(61049)

Now if you do pstack for 61012, it shows all the process thread...
pstack 61012
Thread 4 (Thread 0x2ba7eb993700 (LWP 61014)):
#0  0x00000038e260b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000023a122e in k34ba3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x2ba7ebb94700 (LWP 61015)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000023a1142 in a3467n3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x2ba7ed370700 (LWP 61049)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000024f07c1 in ?? ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2ba7ea56dee0 (LWP 61012)):
#0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
#3  0x0000000000d16e92 in hdlSetCheetahArgs() ()
#4  0x0000000000d227e2 in hdlParserMain ()
#5  0x0000000000d9a92a in CHECKRULES(registeredNameList*, SgWorkspace*, int) ()
#6  0x0000000000d924dc in sgCompareNDesigns(registeredNameList*) ()
#7  0x0000000000d8455c in sgGoCheckRules ()
#8  0x0000000000a95543 in runGoal() ()
#9  0x0000000000a95eea in spyglass_main ()
#10 0x0000000000a96497 in spyglass_main_reentr ()
#11 0x0000000000a93e94 in main ()

Also the strace for the same shows:
strace -cp 61012
Process 61012 attached - interrupt to quit
^CProcess 61012 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    7.416872      200456        37        37 clone
  0.00    0.000041           0      1140           rt_sigprocmask
  0.00    0.000011           0        38           rt_sigreturn
  0.00    0.000000           0        16           msync
  0.00    0.000000           0         4           futex
------ ----------- ----------- --------- --------- ----------------
100.00    7.416924                  1235        37 total

SO some issue related to cloning. The process gets stuck all the time exactly 
in the same place. The same issue is happening in 2.0 and 2.1 for sure and 
exactly at the same place.
Also I am sorry that I could not attach the test program, the reason being it 
is in NDA, plus not only the program, I have to send you a real design also, 
which is not possible. So if you can throw some light on the basis of the stack 
trace, it would be excellent.

Original comment by rajdeep....@gmail.com on 19 Aug 2014 at 8:05

GoogleCodeExporter commented 9 years ago
I will need detailed backtraces from child process to help.

Original comment by alkondratenko on 19 Aug 2014 at 2:54

GoogleCodeExporter commented 9 years ago
I checked all the child process.. All of them are showing exactly the same back 
trace.

Thread 1 (process 8127):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000023a1142 in a3467n3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6

And strace shows:
strace -cp 8127
Process 8127 attached - interrupt to quit
^CProcess 8127 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  -nan    0.000000           0         2         2 rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                     2         2 total

Original comment by rajdeep....@gmail.com on 20 Aug 2014 at 7:46

GoogleCodeExporter commented 9 years ago
Let me first note that those strace outputs are not telling me anything at all.

Backtraces do.

If that's indeed child process and not a thread (which actually seems to be the 
case), then it's not something caused by gperftools.

Double check on your end that backtrace above is not thread of your main 
process but a child process forked in do_system.

Original comment by alkondratenko on 20 Aug 2014 at 6:19

GoogleCodeExporter commented 9 years ago
Another thing to investigate is what exactly do_system is doing. It might be 
useful do post disasm of it as seen by gdb

Original comment by alkondratenko on 20 Aug 2014 at 6:23

GoogleCodeExporter commented 9 years ago
This is for sure that those 3 process are threads and not forked child. Also do 
you mean I run gdb on the main thread, what output you need?

Original comment by rajdeep....@gmail.com on 21 Aug 2014 at 12:44

GoogleCodeExporter commented 9 years ago
Let me elaborate on my thinking a bit.

system function is supposed to fork child and wait for it. In your backtraces 
there are no obvious signs of anything being blocked, but there is active 
system(3) call in one of threads. Therefore my assumption is that system might 
have forked child and is waiting for it. And child might be stuck.

Do ps aux --forest to see if your process indeed has child. And if there's 
child attach to that child via gdb and get me backtraces from it.

Hopefully it'll help.

Original comment by alkondratenko on 22 Aug 2014 at 5:35

GoogleCodeExporter commented 9 years ago
I checked with ps aux --forest, the checker process does not have a forked 
child. This are threads created by OS.
Also gdb on the process gives me something like this:

0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install expat-2.0.1-11.el6_2.x86_64 
fontconfig-2.8.0-3.el6.x86_64 glib2-2.26.1-3.el6.x86_64 
glibc-2.12-1.132.el6.x86_64 libICE-1.0.6-1.el6.x86_64 libSM-1.2.1-2.el6.x86_64 
libX11-1.5.0-4.el6.x86_64 libXau-1.0.6-4.el6.x86_64 libXext-1.3.1-2.el6.x86_64 
libXrender-0.9.7-2.el6.x86_64 libunwind-1.1-2.el6.x86_64 
libuuid-2.17.2-12.14.el6.x86_64 libxcb-1.8.1-1.el6.x86_64 
nss-softokn-freebl-3.14.3-9.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
(gdb) up
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) up
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
(gdb) up
#3  0x0000000000d16e86 in hdlAbcArgs() () at src/hdlAbc.c:2543
2543       veSetSpyschDirPath(SG_CLOM_CNTR->sgSpyDirName);

Original comment by rajdeep....@gmail.com on 22 Aug 2014 at 6:43

GoogleCodeExporter commented 9 years ago
In that case I can conclude that either do_system starts spinning somewhere or 
that it's simply your app calls system in loop.

In order to investigate the former, please do disasm of do_system via gdb.

Original comment by alkondratenko on 22 Aug 2014 at 2:56

GoogleCodeExporter commented 9 years ago
when I do 
gdb) info line do_system
It does not show up starting and ending adddress , it just show up one address.

(gdb) info line do_system
No line number information available for address 0x38e1e3e790 <do_system>
(gdb) disasm 0x38e1e3e790
Undefined command: "disasm".  Try "help".
(gdb) disas 0x38e1e3e790
Dump of assembler code for function do_system:
   0x00000038e1e3e790 <+0>:     push   %r12
   0x00000038e1e3e792 <+2>:     xor    %eax,%eax
   0x00000038e1e3e794 <+4>:     mov    %rdi,%r12
   0x00000038e1e3e797 <+7>:     mov    $0x10,%ecx
   0x00000038e1e3e79c <+12>:    mov    $0x1,%esi
   0x00000038e1e3e7a1 <+17>:    push   %rbp
   0x00000038e1e3e7a2 <+18>:    push   %rbx
   0x00000038e1e3e7a3 <+19>:    sub    $0x170,%rsp
   0x00000038e1e3e7aa <+26>:    lea    0x8(%rsp),%rdx
   0x00000038e1e3e7af <+31>:    movq   $0x1,(%rsp)
   0x00000038e1e3e7b7 <+39>:    movl   $0x0,0x88(%rsp)
   0x00000038e1e3e7c2 <+50>:    mov    %rsp,%rbx
   0x00000038e1e3e7c5 <+53>:    mov    %rdx,%rdi
   0x00000038e1e3e7c8 <+56>:    rep stos %rax,%es:(%rdi)
   0x00000038e1e3e7cb <+59>:    cmpl   $0x0,0x354ede(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e7d2 <+66>:    je     0x38e1e3e7e4 <do_system+84>
   0x00000038e1e3e7d4 <+68>:    lock cmpxchg %esi,0x3514c0(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e7dc <+76>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7e2 <+82>:    jmp    0x38e1e3e7f1 <do_system+97>
   0x00000038e1e3e7e4 <+84>:    cmpxchg %esi,0x3514b1(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e7eb <+91>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7f1 <+97>:    mov    0x3514a1(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e7f7 <+103>:   lea    0x1(%rax),%edx
   0x00000038e1e3e7fa <+106>:   test   %eax,%eax
   0x00000038e1e3e7fc <+108>:   mov    %edx,0x351496(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e802 <+114>:   jne    0x38e1e3e83c <do_system+172>
   0x00000038e1e3e804 <+116>:   lea    0x351355(%rip),%rdx        # 0x38e218fb60 <intr>
   0x00000038e1e3e80b <+123>:   mov    %rsp,%rsi
   0x00000038e1e3e80e <+126>:   mov    $0x2,%edi
   0x00000038e1e3e813 <+131>:   callq  0x38e1e32bb0 <sigaction>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e818 <+136>:   test   %eax,%eax
   0x00000038e1e3e81a <+138>:   js     0x38e1e3ea6a <do_system+730>
   0x00000038e1e3e820 <+144>:   lea    0x3513d9(%rip),%rdx        # 0x38e218fc00 <quit>
   0x00000038e1e3e827 <+151>:   mov    %rsp,%rsi
   0x00000038e1e3e82a <+154>:   mov    $0x3,%edi
   0x00000038e1e3e82f <+159>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e834 <+164>:   test   %eax,%eax
   0x00000038e1e3e836 <+166>:   js     0x38e1e3ea9f <do_system+783>
   0x00000038e1e3e83c <+172>:   cmpl   $0x0,0x354e6d(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e843 <+179>:   je     0x38e1e3e854 <do_system+196>
   0x00000038e1e3e845 <+181>:   lock decl 0x351450(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e84c <+188>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e852 <+194>:   jmp    0x38e1e3e860 <do_system+208>
   0x00000038e1e3e854 <+196>:   decl   0x351442(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e85a <+202>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e860 <+208>:   lea    0xa0(%rsp),%rbp
   0x00000038e1e3e868 <+216>:   lea    0x8(%rbx),%rsi
   0x00000038e1e3e86c <+220>:   xor    %edi,%edi
   0x00000038e1e3e86e <+222>:   orq    $0x10000,0x8(%rsp)
   0x00000038e1e3e877 <+231>:   mov    %rbp,%rdx
   0x00000038e1e3e87a <+234>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e87f <+239>:   test   %eax,%eax
   0x00000038e1e3e881 <+241>:   js     0x38e1e3eac8 <do_system+824>
   0x00000038e1e3e887 <+247>:   mov    0x354c53(%rip),%ebx        # 0x38e21934e0 <__libc_pthread_functions_init>
   0x00000038e1e3e88d <+253>:   test   %ebx,%ebx
   0x00000038e1e3e88f <+255>:   jne    0x38e1e3ea38 <do_system+680>
   0x00000038e1e3e895 <+261>:   lea    0x3c4(%rip),%rax        # 0x38e1e3ec60 <cancel_handler>
   0x00000038e1e3e89c <+268>:   mov    %rax,0x140(%rsp)
   0x00000038e1e3e8a4 <+276>:   lea    0x168(%rsp),%rax
   0x00000038e1e3e8ac <+284>:   mov    %rax,0x148(%rsp)
   0x00000038e1e3e8b4 <+292>:   lea    0x168(%rsp),%rdx
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e8bc <+300>:   xor    %esi,%esi
   0x00000038e1e3e8be <+302>:   mov    $0x100011,%edi
   0x00000038e1e3e8c3 <+307>:   mov    $0x38,%eax
   0x00000038e1e3e8c8 <+312>:   syscall
=> 0x00000038e1e3e8ca <+314>:   cmp    $0xfffffffffffff000,%rax
   0x00000038e1e3e8d0 <+320>:   ja     0x38e1e3ebcc <do_system+1084>
   0x00000038e1e3e8d6 <+326>:   cmp    $0x0,%eax
   0x00000038e1e3e8d9 <+329>:   mov    %eax,%edi
   0x00000038e1e3e8db <+331>:   mov    %eax,0x168(%rsp)
   0x00000038e1e3e8e2 <+338>:   je     0x38e1e3eb2b <do_system+923>
   0x00000038e1e3e8e8 <+344>:   lea    0x16c(%rsp),%r12
   0x00000038e1e3e8f0 <+352>:   jge    0x38e1e3e8ff <do_system+367>
   0x00000038e1e3e8f2 <+354>:   jmp    0x38e1e3e92b <do_system+411>
   0x00000038e1e3e8f4 <+356>:   nopl   0x0(%rax)
   0x00000038e1e3e8f8 <+360>:   mov    0x168(%rsp),%edi
   0x00000038e1e3e8ff <+367>:   xor    %edx,%edx
   0x00000038e1e3e901 <+369>:   mov    %r12,%rsi
   0x00000038e1e3e904 <+372>:   callq  0x38e1eac8a0 <waitpid>
   0x00000038e1e3e909 <+377>:   cltq
   0x00000038e1e3e90b <+379>:   cmp    $0xffffffffffffffff,%rax
   0x00000038e1e3e90f <+383>:   jne    0x38e1e3e91e <do_system+398>
   0x00000038e1e3e911 <+385>:   mov    0x34f690(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3e918 <+392>:   cmpl   $0x4,%fs:(%rdx)
   0x00000038e1e3e91c <+396>:   je     0x38e1e3e8f8 <do_system+360>
   0x00000038e1e3e91e <+398>:   movslq 0x168(%rsp),%rdx
   0x00000038e1e3e926 <+406>:   cmp    %rdx,%rax
   0x00000038e1e3e929 <+409>:   je     0x38e1e3e936 <do_system+422>
   0x00000038e1e3e92b <+411>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e936 <+422>:   test   %ebx,%ebx
   0x00000038e1e3e938 <+424>:   jne    0x38e1e3ea10 <do_system+640>
   0x00000038e1e3e93e <+430>:   mov    $0x1,%esi
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e943 <+435>:   xor    %eax,%eax
   0x00000038e1e3e945 <+437>:   cmpl   $0x0,0x354d64(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e94c <+444>:   je     0x38e1e3e95e <do_system+462>
   0x00000038e1e3e94e <+446>:   lock cmpxchg %esi,0x351346(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e956 <+454>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e95c <+460>:   jmp    0x38e1e3e96b <do_system+475>
   0x00000038e1e3e95e <+462>:   cmpxchg %esi,0x351337(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e965 <+469>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e96b <+475>:   mov    0x351327(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e971 <+481>:   sub    $0x1,%eax
   0x00000038e1e3e974 <+484>:   test   %eax,%eax
   0x00000038e1e3e976 <+486>:   mov    %eax,0x35131c(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e97c <+492>:   jne    0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e97e <+494>:   lea    0x3511db(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3e985 <+501>:   xor    %edx,%edx
   0x00000038e1e3e987 <+503>:   mov    $0x2,%edi
   0x00000038e1e3e98c <+508>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e991 <+513>:   lea    0x351268(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3e998 <+520>:   xor    %edx,%edx
   0x00000038e1e3e99a <+522>:   mov    $0x3,%edi
   0x00000038e1e3e99f <+527>:   mov    %eax,%ebx
   0x00000038e1e3e9a1 <+529>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e9a6 <+534>:   test   %eax,%eax
   0x00000038e1e3e9a8 <+536>:   jne    0x38e1e3e9ae <do_system+542>
   0x00000038e1e3e9aa <+538>:   test   %ebx,%ebx
   0x00000038e1e3e9ac <+540>:   je     0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e9ae <+542>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e9b9 <+553>:   cmpl   $0x0,0x354cf0(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e9c0 <+560>:   je     0x38e1e3e9d1 <do_system+577>
   0x00000038e1e3e9c2 <+562>:   lock decl 0x3512d3(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9c9 <+569>:   jne    0x38e1e3ed7e <_L_unlock_213>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e9cf <+575>:   jmp    0x38e1e3e9dd <do_system+589>
   0x00000038e1e3e9d1 <+577>:   decl   0x3512c5(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9d7 <+583>:   jne    0x38e1e3ed7e <_L_unlock_213>
   0x00000038e1e3e9dd <+589>:   mov    0x16c(%rsp),%eax
   0x00000038e1e3e9e4 <+596>:   add    $0x170,%rsp
   0x00000038e1e3e9eb <+603>:   pop    %rbx
   0x00000038e1e3e9ec <+604>:   pop    %rbp
   0x00000038e1e3e9ed <+605>:   pop    %r12
   0x00000038e1e3e9ef <+607>:   retq
   0x00000038e1e3e9f0 <+608>:   xor    %edx,%edx
   0x00000038e1e3e9f2 <+610>:   mov    %rbp,%rsi
   0x00000038e1e3e9f5 <+613>:   mov    $0x2,%edi
   0x00000038e1e3e9fa <+618>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e9ff <+623>:   test   %eax,%eax
   0x00000038e1e3ea01 <+625>:   je     0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea03 <+627>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3ea0e <+638>:   jmp    0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea10 <+640>:   mov    0x354c61(%rip),%rax        # 0x38e2193678 <__libc_pthread_functions+376>
   0x00000038e1e3ea17 <+647>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea1f <+655>:   xor    %esi,%esi
   0x00000038e1e3ea21 <+657>:   ror    $0x11,%rax
   0x00000038e1e3ea25 <+661>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea2e <+670>:   callq  *%rax
   0x00000038e1e3ea30 <+672>:   jmpq   0x38e1e3e93e <do_system+430>
   0x00000038e1e3ea35 <+677>:   nopl   (%rax)
   0x00000038e1e3ea38 <+680>:   mov    0x354c31(%rip),%rax        # 0x38e2193670 <__libc_pthread_functions+368>
   0x00000038e1e3ea3f <+687>:   lea    0x168(%rsp),%rdx
   0x00000038e1e3ea47 <+695>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea4f <+703>:   ror    $0x11,%rax
   0x00000038e1e3ea53 <+707>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea5c <+716>:   lea    0x1fd(%rip),%rsi        # 0x38e1e3ec60 <cancel_handler>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3ea63 <+723>:   callq  *%rax
   0x00000038e1e3ea65 <+725>:   jmpq   0x38e1e3e8b4 <do_system+292>
   0x00000038e1e3ea6a <+730>:   subl   $0x1,0x351227(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ea71 <+737>:   cmpl   $0x0,0x354c38(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ea78 <+744>:   je     0x38e1e3ea89 <do_system+761>
   0x00000038e1e3ea7a <+746>:   lock decl 0x35121b(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea81 <+753>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea87 <+759>:   jmp    0x38e1e3ea95 <do_system+773>
   0x00000038e1e3ea89 <+761>:   decl   0x35120d(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea8f <+767>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea95 <+773>:   mov    $0xffffffff,%eax
   0x00000038e1e3ea9a <+778>:   jmpq   0x38e1e3e9e4 <do_system+596>
   0x00000038e1e3ea9f <+783>:   mov    0x34f502(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eaa6 <+790>:   subl   $0x1,0x3511eb(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eaad <+797>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eab0 <+800>:   lea    0x3510a9(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eab7 <+807>:   xor    %edx,%edx
   0x00000038e1e3eab9 <+809>:   mov    $0x2,%edi
   0x00000038e1e3eabe <+814>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eac3 <+819>:   mov    %ebp,%fs:(%rbx)
   0x00000038e1e3eac6 <+822>:   jmp    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eac8 <+824>:   mov    $0x1,%esi
   0x00000038e1e3eacd <+829>:   xor    %eax,%eax
   0x00000038e1e3eacf <+831>:   cmpl   $0x0,0x354bda(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ead6 <+838>:   je     0x38e1e3eae8 <do_system+856>
   0x00000038e1e3ead8 <+840>:   lock cmpxchg %esi,0x3511bc(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3eae0 <+848>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eae6 <+854>:   jmp    0x38e1e3eaf5 <do_system+869>
   0x00000038e1e3eae8 <+856>:   cmpxchg %esi,0x3511ad(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3eaef <+863>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eaf5 <+869>:   mov    0x35119d(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3eafb <+875>:   sub    $0x1,%eax
   0x00000038e1e3eafe <+878>:   test   %eax,%eax
   0x00000038e1e3eb00 <+880>:   mov    %eax,0x351192(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eb06 <+886>:   jne    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eb0c <+892>:   mov    0x34f495(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eb13 <+899>:   lea    0x3510e6(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb1a <+906>:   xor    %edx,%edx
   0x00000038e1e3eb1c <+908>:   mov    $0x3,%edi
   0x00000038e1e3eb21 <+913>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eb24 <+916>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb29 <+921>:   jmp    0x38e1e3eab0 <do_system+800>
   0x00000038e1e3eb2b <+923>:   lea    0x117256(%rip),%rax        # 0x38e1f55d88
   0x00000038e1e3eb32 <+930>:   lea    0x351027(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eb39 <+937>:   xor    %edx,%edx
   0x00000038e1e3eb3b <+939>:   mov    $0x2,%dil
   0x00000038e1e3eb3e <+942>:   mov    %r12,0x130(%rsp)
   0x00000038e1e3eb46 <+950>:   movq   $0x0,0x138(%rsp)
   0x00000038e1e3eb52 <+962>:   mov    %rax,0x120(%rsp)
   0x00000038e1e3eb5a <+970>:   lea    0x11721f(%rip),%rax        # 0x38e1f55d80
   0x00000038e1e3eb61 <+977>:   mov    %rax,0x128(%rsp)
   0x00000038e1e3eb69 <+985>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb6e <+990>:   lea    0x35108b(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb75 <+997>:   xor    %edx,%edx
   0x00000038e1e3eb77 <+999>:   mov    $0x3,%edi
   0x00000038e1e3eb7c <+1004>:  callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb81 <+1009>:  xor    %edx,%edx
   0x00000038e1e3eb83 <+1011>:  mov    %rbp,%rsi
   0x00000038e1e3eb86 <+1014>:  mov    $0x2,%edi
   0x00000038e1e3eb8b <+1019>:  callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3eb90 <+1024>:  mov    0x34f3b1(%rip),%rax        # 0x38e218df48
   0x00000038e1e3eb97 <+1031>:  lea    0x1171e5(%rip),%rdi        # 0x38e1f55d83
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3eb9e <+1038>:  lea    0x120(%rsp),%rsi
   0x00000038e1e3eba6 <+1046>:  movl   $0x0,0x3510ec(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ebb0 <+1056>:  movl   $0x0,0x3510de(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ebba <+1066>:  mov    (%rax),%rdx
   0x00000038e1e3ebbd <+1069>:  callq  0x38e1ead040 <execve>
   0x00000038e1e3ebc2 <+1074>:  mov    $0x7f,%edi
   0x00000038e1e3ebc7 <+1079>:  callq  0x38e1eacff0 <_exit>
   0x00000038e1e3ebcc <+1084>:  mov    0x34f3d5(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3ebd3 <+1091>:  neg    %eax
   0x00000038e1e3ebd5 <+1093>:  movl   $0xffffffff,0x168(%rsp)
   0x00000038e1e3ebe0 <+1104>:  mov    %eax,%fs:(%rdx)
   0x00000038e1e3ebe3 <+1107>:  jmpq   0x38e1e3e92b <do_system+411>
End of assembler dump.

interestingly, the hang state only comes when I set the environment variable 
CPUPROFILE to a text output file. If I do not set this varaibale the runs 
completes without any error.

Original comment by rajdeep....@gmail.com on 25 Aug 2014 at 9:38

GoogleCodeExporter commented 9 years ago
I think my above assembly dump is not correct, because the memory address seems 
not correct. SO I am just giving you the diasm of the present state

gdb> disassemble do_system
Dump of assembler code for function do_system:
   0x00000038e1e3e790 <+0>:     push   %r12
   0x00000038e1e3e792 <+2>:     xor    %eax,%eax
   0x00000038e1e3e794 <+4>:     mov    %rdi,%r12
   0x00000038e1e3e797 <+7>:     mov    $0x10,%ecx
   0x00000038e1e3e79c <+12>:    mov    $0x1,%esi
   0x00000038e1e3e7a1 <+17>:    push   %rbp
   0x00000038e1e3e7a2 <+18>:    push   %rbx
   0x00000038e1e3e7a3 <+19>:    sub    $0x170,%rsp
   0x00000038e1e3e7aa <+26>:    lea    0x8(%rsp),%rdx
   0x00000038e1e3e7af <+31>:    movq   $0x1,(%rsp)
   0x00000038e1e3e7b7 <+39>:    movl   $0x0,0x88(%rsp)
   0x00000038e1e3e7c2 <+50>:    mov    %rsp,%rbx
   0x00000038e1e3e7c5 <+53>:    mov    %rdx,%rdi
   0x00000038e1e3e7c8 <+56>:    rep stos %rax,%es:(%rdi)
   0x00000038e1e3e7cb <+59>:    cmpl   $0x0,0x354ede(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e7d2 <+66>:    je     0x38e1e3e7e4 <do_system+84>
   0x00000038e1e3e7d4 <+68>:    lock cmpxchg %esi,0x3514c0(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e7dc <+76>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7e2 <+82>:    jmp    0x38e1e3e7f1 <do_system+97>
   0x00000038e1e3e7e4 <+84>:    cmpxchg %esi,0x3514b1(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e7eb <+91>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7f1 <+97>:    mov    0x3514a1(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e7f7 <+103>:   lea    0x1(%rax),%edx
   0x00000038e1e3e7fa <+106>:   test   %eax,%eax
   0x00000038e1e3e7fc <+108>:   mov    %edx,0x351496(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e802 <+114>:   jne    0x38e1e3e83c <do_system+172>
   0x00000038e1e3e804 <+116>:   lea    0x351355(%rip),%rdx        # 0x38e218fb60 <intr>
   0x00000038e1e3e80b <+123>:   mov    %rsp,%rsi
   0x00000038e1e3e80e <+126>:   mov    $0x2,%edi
   0x00000038e1e3e813 <+131>:   callq  0x38e1e32bb0 <sigaction>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e818 <+136>:   test   %eax,%eax
   0x00000038e1e3e81a <+138>:   js     0x38e1e3ea6a <do_system+730>
   0x00000038e1e3e820 <+144>:   lea    0x3513d9(%rip),%rdx        # 0x38e218fc00 <quit>
   0x00000038e1e3e827 <+151>:   mov    %rsp,%rsi
   0x00000038e1e3e82a <+154>:   mov    $0x3,%edi
   0x00000038e1e3e82f <+159>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e834 <+164>:   test   %eax,%eax
   0x00000038e1e3e836 <+166>:   js     0x38e1e3ea9f <do_system+783>
   0x00000038e1e3e83c <+172>:   cmpl   $0x0,0x354e6d(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e843 <+179>:   je     0x38e1e3e854 <do_system+196>
   0x00000038e1e3e845 <+181>:   lock decl 0x351450(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e84c <+188>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e852 <+194>:   jmp    0x38e1e3e860 <do_system+208>
   0x00000038e1e3e854 <+196>:   decl   0x351442(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e85a <+202>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e860 <+208>:   lea    0xa0(%rsp),%rbp
   0x00000038e1e3e868 <+216>:   lea    0x8(%rbx),%rsi
   0x00000038e1e3e86c <+220>:   xor    %edi,%edi
   0x00000038e1e3e86e <+222>:   orq    $0x10000,0x8(%rsp)
   0x00000038e1e3e877 <+231>:   mov    %rbp,%rdx
   0x00000038e1e3e87a <+234>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e87f <+239>:   test   %eax,%eax
   0x00000038e1e3e881 <+241>:   js     0x38e1e3eac8 <do_system+824>
   0x00000038e1e3e887 <+247>:   mov    0x354c53(%rip),%ebx        # 0x38e21934e0 <__libc_pthread_functions_init>
   0x00000038e1e3e88d <+253>:   test   %ebx,%ebx
   0x00000038e1e3e88f <+255>:   jne    0x38e1e3ea38 <do_system+680>
   0x00000038e1e3e895 <+261>:   lea    0x3c4(%rip),%rax        # 0x38e1e3ec60 <cancel_handler>
   0x00000038e1e3e89c <+268>:   mov    %rax,0x140(%rsp)
   0x00000038e1e3e8a4 <+276>:   lea    0x168(%rsp),%rax
   0x00000038e1e3e8ac <+284>:   mov    %rax,0x148(%rsp)
   0x00000038e1e3e8b4 <+292>:   lea    0x168(%rsp),%rdx
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e8bc <+300>:   xor    %esi,%esi
   0x00000038e1e3e8be <+302>:   mov    $0x100011,%edi
   0x00000038e1e3e8c3 <+307>:   mov    $0x38,%eax
   0x00000038e1e3e8c8 <+312>:   syscall
=> 0x00000038e1e3e8ca <+314>:   cmp    $0xfffffffffffff000,%rax
   0x00000038e1e3e8d0 <+320>:   ja     0x38e1e3ebcc <do_system+1084>
   0x00000038e1e3e8d6 <+326>:   cmp    $0x0,%eax
   0x00000038e1e3e8d9 <+329>:   mov    %eax,%edi
   0x00000038e1e3e8db <+331>:   mov    %eax,0x168(%rsp)
   0x00000038e1e3e8e2 <+338>:   je     0x38e1e3eb2b <do_system+923>
   0x00000038e1e3e8e8 <+344>:   lea    0x16c(%rsp),%r12
   0x00000038e1e3e8f0 <+352>:   jge    0x38e1e3e8ff <do_system+367>
   0x00000038e1e3e8f2 <+354>:   jmp    0x38e1e3e92b <do_system+411>
   0x00000038e1e3e8f4 <+356>:   nopl   0x0(%rax)
   0x00000038e1e3e8f8 <+360>:   mov    0x168(%rsp),%edi
   0x00000038e1e3e8ff <+367>:   xor    %edx,%edx
   0x00000038e1e3e901 <+369>:   mov    %r12,%rsi
   0x00000038e1e3e904 <+372>:   callq  0x38e1eac8a0 <waitpid>
   0x00000038e1e3e909 <+377>:   cltq
   0x00000038e1e3e90b <+379>:   cmp    $0xffffffffffffffff,%rax
   0x00000038e1e3e90f <+383>:   jne    0x38e1e3e91e <do_system+398>
   0x00000038e1e3e911 <+385>:   mov    0x34f690(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3e918 <+392>:   cmpl   $0x4,%fs:(%rdx)
   0x00000038e1e3e91c <+396>:   je     0x38e1e3e8f8 <do_system+360>
   0x00000038e1e3e91e <+398>:   movslq 0x168(%rsp),%rdx
   0x00000038e1e3e926 <+406>:   cmp    %rdx,%rax
   0x00000038e1e3e929 <+409>:   je     0x38e1e3e936 <do_system+422>
   0x00000038e1e3e92b <+411>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e936 <+422>:   test   %ebx,%ebx
   0x00000038e1e3e938 <+424>:   jne    0x38e1e3ea10 <do_system+640>
   0x00000038e1e3e93e <+430>:   mov    $0x1,%esi
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e943 <+435>:   xor    %eax,%eax
   0x00000038e1e3e945 <+437>:   cmpl   $0x0,0x354d64(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e94c <+444>:   je     0x38e1e3e95e <do_system+462>
   0x00000038e1e3e94e <+446>:   lock cmpxchg %esi,0x351346(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e956 <+454>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e95c <+460>:   jmp    0x38e1e3e96b <do_system+475>
   0x00000038e1e3e95e <+462>:   cmpxchg %esi,0x351337(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e965 <+469>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e96b <+475>:   mov    0x351327(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e971 <+481>:   sub    $0x1,%eax
   0x00000038e1e3e974 <+484>:   test   %eax,%eax
   0x00000038e1e3e976 <+486>:   mov    %eax,0x35131c(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e97c <+492>:   jne    0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e97e <+494>:   lea    0x3511db(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3e985 <+501>:   xor    %edx,%edx
   0x00000038e1e3e987 <+503>:   mov    $0x2,%edi
   0x00000038e1e3e98c <+508>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e991 <+513>:   lea    0x351268(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3e998 <+520>:   xor    %edx,%edx
   0x00000038e1e3e99a <+522>:   mov    $0x3,%edi
   0x00000038e1e3e99f <+527>:   mov    %eax,%ebx
   0x00000038e1e3e9a1 <+529>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e9a6 <+534>:   test   %eax,%eax
   0x00000038e1e3e9a8 <+536>:   jne    0x38e1e3e9ae <do_system+542>
   0x00000038e1e3e9aa <+538>:   test   %ebx,%ebx
   0x00000038e1e3e9ac <+540>:   je     0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e9ae <+542>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e9b9 <+553>:   cmpl   $0x0,0x354cf0(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e9c0 <+560>:   je     0x38e1e3e9d1 <do_system+577>
   0x00000038e1e3e9c2 <+562>:   lock decl 0x3512d3(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9c9 <+569>:   jne    0x38e1e3ed7e <_L_unlock_213>
   0x00000038e1e3e9cf <+575>:   jmp    0x38e1e3e9dd <do_system+589>
   0x00000038e1e3e9d1 <+577>:   decl   0x3512c5(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9d7 <+583>:   jne    0x38e1e3ed7e <_L_unlock_213>
   0x00000038e1e3e9dd <+589>:   mov    0x16c(%rsp),%eax
   0x00000038e1e3e9e4 <+596>:   add    $0x170,%rsp
   0x00000038e1e3e9eb <+603>:   pop    %rbx
   0x00000038e1e3e9ec <+604>:   pop    %rbp
   0x00000038e1e3e9ed <+605>:   pop    %r12
   0x00000038e1e3e9ef <+607>:   retq
   0x00000038e1e3e9f0 <+608>:   xor    %edx,%edx
   0x00000038e1e3e9f2 <+610>:   mov    %rbp,%rsi
   0x00000038e1e3e9f5 <+613>:   mov    $0x2,%edi
   0x00000038e1e3e9fa <+618>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e9ff <+623>:   test   %eax,%eax
   0x00000038e1e3ea01 <+625>:   je     0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea03 <+627>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3ea0e <+638>:   jmp    0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea10 <+640>:   mov    0x354c61(%rip),%rax        # 0x38e2193678 <__libc_pthread_functions+376>
   0x00000038e1e3ea17 <+647>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea1f <+655>:   xor    %esi,%esi
   0x00000038e1e3ea21 <+657>:   ror    $0x11,%rax
   0x00000038e1e3ea25 <+661>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea2e <+670>:   callq  *%rax
   0x00000038e1e3ea30 <+672>:   jmpq   0x38e1e3e93e <do_system+430>
   0x00000038e1e3ea35 <+677>:   nopl   (%rax)
   0x00000038e1e3ea38 <+680>:   mov    0x354c31(%rip),%rax        # 0x38e2193670 <__libc_pthread_functions+368>
   0x00000038e1e3ea3f <+687>:   lea    0x168(%rsp),%rdx
   0x00000038e1e3ea47 <+695>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea4f <+703>:   ror    $0x11,%rax
   0x00000038e1e3ea53 <+707>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea5c <+716>:   lea    0x1fd(%rip),%rsi        # 0x38e1e3ec60 <cancel_handler>
   0x00000038e1e3ea63 <+723>:   callq  *%rax
   0x00000038e1e3ea65 <+725>:   jmpq   0x38e1e3e8b4 <do_system+292>
   0x00000038e1e3ea6a <+730>:   subl   $0x1,0x351227(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ea71 <+737>:   cmpl   $0x0,0x354c38(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ea78 <+744>:   je     0x38e1e3ea89 <do_system+761>
   0x00000038e1e3ea7a <+746>:   lock decl 0x35121b(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea81 <+753>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea87 <+759>:   jmp    0x38e1e3ea95 <do_system+773>
   0x00000038e1e3ea89 <+761>:   decl   0x35120d(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea8f <+767>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea95 <+773>:   mov    $0xffffffff,%eax
   0x00000038e1e3ea9a <+778>:   jmpq   0x38e1e3e9e4 <do_system+596>
   0x00000038e1e3ea9f <+783>:   mov    0x34f502(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eaa6 <+790>:   subl   $0x1,0x3511eb(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eaad <+797>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eab0 <+800>:   lea    0x3510a9(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eab7 <+807>:   xor    %edx,%edx
   0x00000038e1e3eab9 <+809>:   mov    $0x2,%edi
   0x00000038e1e3eabe <+814>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eac3 <+819>:   mov    %ebp,%fs:(%rbx)
   0x00000038e1e3eac6 <+822>:   jmp    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eac8 <+824>:   mov    $0x1,%esi
   0x00000038e1e3eacd <+829>:   xor    %eax,%eax
   0x00000038e1e3eacf <+831>:   cmpl   $0x0,0x354bda(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ead6 <+838>:   je     0x38e1e3eae8 <do_system+856>
   0x00000038e1e3ead8 <+840>:   lock cmpxchg %esi,0x3511bc(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3eae0 <+848>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eae6 <+854>:   jmp    0x38e1e3eaf5 <do_system+869>
   0x00000038e1e3eae8 <+856>:   cmpxchg %esi,0x3511ad(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3eaef <+863>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eaf5 <+869>:   mov    0x35119d(%rip),%eax        # 0x38e218fc98 <sa_refcntr>

   0x00000038e1e3eafb <+875>:   sub    $0x1,%eax
   0x00000038e1e3eafe <+878>:   test   %eax,%eax
   0x00000038e1e3eb00 <+880>:   mov    %eax,0x351192(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eb06 <+886>:   jne    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eb0c <+892>:   mov    0x34f495(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eb13 <+899>:   lea    0x3510e6(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb1a <+906>:   xor    %edx,%edx
   0x00000038e1e3eb1c <+908>:   mov    $0x3,%edi
   0x00000038e1e3eb21 <+913>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eb24 <+916>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb29 <+921>:   jmp    0x38e1e3eab0 <do_system+800>
   0x00000038e1e3eb2b <+923>:   lea    0x117256(%rip),%rax        # 0x38e1f55d88
   0x00000038e1e3eb32 <+930>:   lea    0x351027(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eb39 <+937>:   xor    %edx,%edx
   0x00000038e1e3eb3b <+939>:   mov    $0x2,%dil
   0x00000038e1e3eb3e <+942>:   mov    %r12,0x130(%rsp)
   0x00000038e1e3eb46 <+950>:   movq   $0x0,0x138(%rsp)
   0x00000038e1e3eb52 <+962>:   mov    %rax,0x120(%rsp)
   0x00000038e1e3eb5a <+970>:   lea    0x11721f(%rip),%rax        # 0x38e1f55d80
   0x00000038e1e3eb61 <+977>:   mov    %rax,0x128(%rsp)
   0x00000038e1e3eb69 <+985>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb6e <+990>:   lea    0x35108b(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb75 <+997>:   xor    %edx,%edx
   0x00000038e1e3eb77 <+999>:   mov    $0x3,%edi
   0x00000038e1e3eb7c <+1004>:  callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb81 <+1009>:  xor    %edx,%edx
   0x00000038e1e3eb83 <+1011>:  mov    %rbp,%rsi
   0x00000038e1e3eb86 <+1014>:  mov    $0x2,%edi
   0x00000038e1e3eb8b <+1019>:  callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3eb90 <+1024>:  mov    0x34f3b1(%rip),%rax        # 0x38e218df48
   0x00000038e1e3eb97 <+1031>:  lea    0x1171e5(%rip),%rdi        # 0x38e1f55d83
   0x00000038e1e3eb9e <+1038>:  lea    0x120(%rsp),%rsi
   0x00000038e1e3eba6 <+1046>:  movl   $0x0,0x3510ec(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ebb0 <+1056>:  movl   $0x0,0x3510de(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ebba <+1066>:  mov    (%rax),%rdx
   0x00000038e1e3ebbd <+1069>:  callq  0x38e1ead040 <execve>
   0x00000038e1e3ebc2 <+1074>:  mov    $0x7f,%edi
   0x00000038e1e3ebc7 <+1079>:  callq  0x38e1eacff0 <_exit>
   0x00000038e1e3ebcc <+1084>:  mov    0x34f3d5(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3ebd3 <+1091>:  neg    %eax
   0x00000038e1e3ebd5 <+1093>:  movl   $0xffffffff,0x168(%rsp)
   0x00000038e1e3ebe0 <+1104>:  mov    %eax,%fs:(%rdx)
   0x00000038e1e3ebe3 <+1107>:  jmpq   0x38e1e3e92b <do_system+411>
End of assembler dump.

Original comment by rajdeep....@gmail.com on 25 Aug 2014 at 10:45

GoogleCodeExporter commented 9 years ago
So system is not blocked anywhere but indeed calls clone. You can see above 
that rip is next instruction after syscall instructions. And syscall number is 
"clone".

Here is my recommendation. Try sending "finish" command to gdb after switching 
to main thread that does system. I expect it to return. It would be interesting 
to know what it returns.

My guess is that it returns failure and your program might be retrying it.

Original comment by alkondratenko on 25 Aug 2014 at 4:46

GoogleCodeExporter commented 9 years ago
I have tried to finish it from one or two steps above but no results. Not sure 
if I am doing the right way.
gdb) up
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) up
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
(gdb) down
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) down
#0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
(gdb) up
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) finish
Run till exit from #1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6

Not sure if I am in the main thread or not. Appreciate if you correct me.

Thanks
-Rajdeep

Original comment by rajdeep....@gmail.com on 26 Aug 2014 at 4:29

GoogleCodeExporter commented 9 years ago
There are 3 threads and I tried to finish all of them but everytime it switches 
to thread 1 and hangs

(gdb) info threads
  3 Thread 0x2b5bd614a700 (LWP 36400)  0x00000038e260b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x2b5bd634b700 (LWP 36401)  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x2b5bd4d24ee0 (LWP 36399)  0x00000038e1e3e8ca in do_system () from 
/lib64/libc.so.6

finish
Run till exit from #0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6

It hangs

Original comment by rajdeep....@gmail.com on 26 Aug 2014 at 4:37

GoogleCodeExporter commented 9 years ago
One interesting thing I noticed.
over the code base we have used system() call in numerous places, but it only 
hangs with profiler when we use "\rm -rf " with system(). All other system call 
are not creating this hang issue.

Original comment by rajdeep....@gmail.com on 27 Aug 2014 at 10:08

GoogleCodeExporter commented 9 years ago
Another query..
While runninf pprof, how do I specify my .so to load.

I have a binary a.out, which needs to load .so which gets created in the /tmp 
directory while ruuning a.out
if I just do 
pprof -gv a.out gprof.txt

Then it does not load the .so in /tmp as a result there are many unresolved 
symbols.
I checked an option called --add-libs, but how can add multiple .so 

Original comment by rajdeep....@gmail.com on 28 Aug 2014 at 10:29

GoogleCodeExporter commented 9 years ago
Do you have any updates on this?

Thanks

Original comment by rajdeep....@gmail.com on 1 Sep 2014 at 12:52

GoogleCodeExporter commented 9 years ago
Most likely we're missing something. Here's what we can do to find out what is 
really going on.

* reproduce this again.

* attach gdb and switch to thread that is apparently stuck in system

* do disas to see where it's stuck

* do fin to check again if call to system completes

* if it's stuck stop it again (Ctrl-C)

* switch to do_system thread again and do disas to check where it is stuck

* do fin again

* check by running ps aux --forest if there's child process (there should be 
one).

Post results of _all_ steps.

Original comment by alkondratenko on 6 Sep 2014 at 11:03

GoogleCodeExporter commented 9 years ago
In last week I commented all the systems calls with \rm -rf command in it, and 
now there is no hang. For reproducing I need to create another build. Will get 
back once I have it.

Original comment by rajdeep....@gmail.com on 8 Sep 2014 at 8:04

GoogleCodeExporter commented 9 years ago
I have my own profiler and happen to have same issue. My OS is RHEL6. 
Eventually I found When I stop sampling by SIGPROF, then the hang will 
disappear. And as you mentioned, my hang occur when I call system to "mv" a 
file.

It's more like an OS bug.

Original comment by lim...@gmail.com on 5 Nov 2014 at 8:41

GoogleCodeExporter commented 9 years ago
>> It's more like an OS bug.

This is certainly not excluded yet. But so far I saw no evidence at all that 
it's OS bug.

Original comment by alkondratenko on 5 Nov 2014 at 8:09

GoogleCodeExporter commented 9 years ago
Closing due to inactivity. Please reopen once requested information is 
available.

Original comment by alkondratenko on 29 Nov 2014 at 7:20