AlexiaChen / AlexiaChen.github.io

My Blog https://github.com/AlexiaChen/AlexiaChen.github.io/issues
87 stars 11 forks source link

golang调试分析之解决cosmos sdk在WSL2环境中被长时间block住的问题 #170

Open AlexiaChen opened 1 year ago

AlexiaChen commented 1 year ago

前言

其实这个是我花周末的时间,解决工作上的一个问题。这个问题有点意义,所以就写在博客里面。issue分析在这里 https://github.com/FiiLabs/irita/issues/8 相当于是照搬我写的issue了。irita是文昌链的binary文件,其实基于cosmos sdk开发的,问题也出现在cosmos sdk里面,这里不是cosmos sdk的BUG,而是环境导致的一些问题。

背景描述

据我了解,我本机的WSL2中,和同事的虚拟机中的ubuntu,运行irita的相关命令都很慢,甚至连irita version 都慢,更别说irita keys xxx的相关操作了。因为工作时间,有这些小问题,为了更快地让irita跑起来,有同事就更换了虚拟机,直接用原生Linux,不在虚拟机里面跑了,就绕过了这个问题。而我之前用gdb看了下这个程序的调用堆栈是在等待一把Linux下futex的锁。然后用strace 也查看了Linux的系统函数调用,好像发现了一些关于函数返回权限相关的值,所以我直接用sudo -E irita xxxx 来做操作,甚至运行bash脚本也是用sudo -E 来做的。这样irita就没有被block住的现象了。但是并没有深入源码搞懂程序是在哪里被block住了,以及什么会这样。

所以不想再这样不求甚解了,一怒之下,上了Debugger,下面就来分析为什么会这样。

分析

因为之前用了gdb,但是gdb看不出golang程序的完整的函数调用堆栈(要加入一些编译参数,和用特定的后端生成的才可以用gdb)。 gdb看出是这样:

mathxh@MathxH:~/metablock-projects/irita$ gdb --args ./build/irita keys list
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/irita...
warning: File "/usr/local/go/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /usr/local/go/src/runtime/runtime-gdb.py
line to your configuration file "/home/mathxh/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/mathxh/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb) run
Starting program: /home/mathxh/metablock-projects/irita/build/irita keys list
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffd0258700 (LWP 31202)]
[New Thread 0x7fffcf8b7700 (LWP 31203)]
[New Thread 0x7fffcf076700 (LWP 31204)]
[New Thread 0x7fffce875700 (LWP 31205)]
[New Thread 0x7fffce074700 (LWP 31206)]
[New Thread 0x7fffcd833700 (LWP 31207)]
[New Thread 0x7fffcd022700 (LWP 31208)]
[New Thread 0x7fffaffff700 (LWP 31209)]
[Detaching after vfork from child process 31210]
[New Thread 0x7fffaf7fe700 (LWP 31217)]
[New Thread 0x7fffaeffd700 (LWP 31218)]
^C
Thread 1 "irita" received signal SIGINT, Interrupt.
runtime.futex () at /usr/local/go/src/runtime/sys_linux_amd64.s:560
560             MOVL    AX, ret+40(FP)
(gdb) bt
#0  runtime.futex () at /usr/local/go/src/runtime/sys_linux_amd64.s:560
#1  0x0000000000454976 in runtime.futexsleep (addr=0xfffffffffffffe00, val=0, ns=4780227) at /usr/local/go/src/runtime/os_linux.go:69
#2  0x000000000042a747 in runtime.notesleep (n=0x3ba11c8 <runtime.m0+328>) at /usr/local/go/src/runtime/lock_futex.go:160
#3  0x000000000045f1ac in runtime.mPark () at /usr/local/go/src/runtime/proc.go:2247
#4  runtime.stopm () at /usr/local/go/src/runtime/proc.go:2247
#5  0x0000000000460848 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at /usr/local/go/src/runtime/proc.go:2874
#6  0x000000000046167e in runtime.schedule () at /usr/local/go/src/runtime/proc.go:3214
#7  0x0000000000461bad in runtime.park_m (gp=0xc000f99380) at /usr/local/go/src/runtime/proc.go:3363
#8  0x000000000048af83 in runtime.mcall () at /usr/local/go/src/runtime/asm_amd64.s:448
#9  0x000000000048f865 in runtime.newproc (fn=0x0) at <autogenerated>:1
#10 0x0000000003b9f8a0 in crypto/tls.emptyConfig ()
#11 0x0000000000000000 in ?? ()
(gdb)

其中对我还有误导,我还以为是golang crypto里面的TLS 是不是建立TSL连接被block住了,因为之前我在strace里面有看到epoll_wait。但是这样看,对我的帮助太少了,所以我今天还是换了一种方式,直接上golang的debugger来调试irita version irita keys xxx。最终找到问题所在了。 请看下面的断点和堆栈:

171979b0ebdbbe361161ac66b570a00

Type 'help' for list of commands.
(dlv) break /home/mathxh/go/pkg/mod/github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c/service.go:45
Breakpoint 1 set at 0xd653ad for github.com/gsterjov/go-libsecret.(*Service).Open() /home/mathxh/go/pkg/mod/github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c/service.go:45
(dlv) c
> github.com/gsterjov/go-libsecret.(*Service).Open() /home/mathxh/go/pkg/mod/github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c/service.go:45 (hits goroutine(1):1 total:1) (PC: 0xd653ad)
Warning: debugging optimized function
    40: // OpenSession (IN String algorithm, IN Variant input, OUT Variant output, OUT ObjectPath result);
    41: func (service *Service) Open() (*Session, error) {
    42:   var output dbus.Variant
    43:   var path   dbus.ObjectPath
    44:
=>  45:   err := service.dbus.Call("org.freedesktop.Secret.Service.OpenSession", 0, "plain", dbus.MakeVariant("")).Store(&output, &path)
    46:   if err != nil {
    47:     return &Session{}, err
    48:   }
    49:
    50:   return NewSession(service.conn, path), nil
(dlv) so
received SIGINT, stopping process (will not forward signal)
> runtime.futex() /usr/local/go/src/runtime/sys_linux_amd64.s:560 (PC: 0x48f0c3)
Warning: debugging optimized function
   555:         MOVQ    ts+16(FP), R10
   556:         MOVQ    addr2+24(FP), R8
   557:         MOVL    val3+32(FP), R9
   558:         MOVL    $SYS_futex, AX
   559:         SYSCALL
=> 560:         MOVL    AX, ret+40(FP)
   561:         RET
   562:
   563: // int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
   564: TEXT runtime·clone(SB),NOSPLIT,$0
   565:         MOVL    flags+0(FP), DI
(dlv) restart
Process restarted with PID 31161
(dlv) c
> github.com/gsterjov/go-libsecret.(*Service).Open() /home/mathxh/go/pkg/mod/github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c/service.go:45 (hits goroutine(1):1 total:1) (PC: 0xd653ad)
Warning: debugging optimized function
    40: // OpenSession (IN String algorithm, IN Variant input, OUT Variant output, OUT ObjectPath result);
    41: func (service *Service) Open() (*Session, error) {
    42:   var output dbus.Variant
    43:   var path   dbus.ObjectPath
    44:
=>  45:   err := service.dbus.Call("org.freedesktop.Secret.Service.OpenSession", 0, "plain", dbus.MakeVariant("")).Store(&output, &path)
    46:   if err != nil {
    47:     return &Session{}, err
    48:   }
    49:
    50:   return NewSession(service.conn, path), nil
(dlv) bt
 0  0x0000000000d653ad in github.com/gsterjov/go-libsecret.(*Service).Open
    at /home/mathxh/go/pkg/mod/github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c/service.go:45
 1  0x0000000000d69929 in github.com/99designs/keyring.(*secretsKeyring).openSecrets
    at /home/mathxh/go/pkg/mod/github.com/99designs/keyring@v1.1.6/libsecret.go:60
 2  0x0000000000d697e9 in github.com/99designs/keyring.init.2.func1
    at /home/mathxh/go/pkg/mod/github.com/99designs/keyring@v1.1.6/libsecret.go:38
 3  0x0000000000d68075 in github.com/99designs/keyring.Open
    at /home/mathxh/go/pkg/mod/github.com/99designs/keyring@v1.1.6/keyring.go:65
 4  0x0000000000d7dfd8 in github.com/cosmos/cosmos-sdk/crypto/keyring.New
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/crypto/keyring/keyring.go:182
 5  0x0000000000e8c714 in github.com/cosmos/cosmos-sdk/client.NewKeyringFromBackend
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/client/context.go:372
 6  0x0000000000e89265 in github.com/cosmos/cosmos-sdk/client.ReadPersistentCommandFlags
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/client/cmd.go:127
 7  0x0000000001dd4ab9 in github.com/bianjieai/irita/cmd/irita/cmd.NewRootCmd.func1
    at ./cmd/irita/cmd/root.go:79
 8  0x0000000000b76a22 in github.com/tendermint/tendermint/libs/cli.concatCobraCmdFuncs.func1
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/tendermint@v0.34.13-irita-221114/libs/cli/setup.go:118
 9  0x00000000005ded6b in github.com/spf13/cobra.(*Command).execute
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:835
10  0x00000000005df4dd in github.com/spf13/cobra.(*Command).ExecuteC
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:974
11  0x00000000011589cb in github.com/spf13/cobra.(*Command).Execute
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:902
12  0x00000000011589cb in github.com/spf13/cobra.(*Command).ExecuteContext
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:895
13  0x00000000011589cb in github.com/cosmos/cosmos-sdk/server/cmd.Execute
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/server/cmd/execute.go:36
14  0x0000000001de0b0c in main.main
    at ./cmd/irita/main.go:21
Sending output to pager...
 0  0x0000000000d653ad in github.com/gsterjov/go-libsecret.(*Service).Open
    at /home/mathxh/go/pkg/mod/github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c/service.go:45
 1  0x0000000000d69929 in github.com/99designs/keyring.(*secretsKeyring).openSecrets
    at /home/mathxh/go/pkg/mod/github.com/99designs/keyring@v1.1.6/libsecret.go:60
 2  0x0000000000d697e9 in github.com/99designs/keyring.init.2.func1
    at /home/mathxh/go/pkg/mod/github.com/99designs/keyring@v1.1.6/libsecret.go:38
 3  0x0000000000d68075 in github.com/99designs/keyring.Open
    at /home/mathxh/go/pkg/mod/github.com/99designs/keyring@v1.1.6/keyring.go:65
 4  0x0000000000d7dfd8 in github.com/cosmos/cosmos-sdk/crypto/keyring.New
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/crypto/keyring/keyring.go:182
 5  0x0000000000e8c714 in github.com/cosmos/cosmos-sdk/client.NewKeyringFromBackend
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/client/context.go:372
 6  0x0000000000e89265 in github.com/cosmos/cosmos-sdk/client.ReadPersistentCommandFlags
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/client/cmd.go:127
 7  0x0000000001dd4ab9 in github.com/bianjieai/irita/cmd/irita/cmd.NewRootCmd.func1
    at ./cmd/irita/cmd/root.go:79
 8  0x0000000000b76a22 in github.com/tendermint/tendermint/libs/cli.concatCobraCmdFuncs.func1
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/tendermint@v0.34.13-irita-221114/libs/cli/setup.go:118
 9  0x00000000005ded6b in github.com/spf13/cobra.(*Command).execute
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:835
10  0x00000000005df4dd in github.com/spf13/cobra.(*Command).ExecuteC
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:974
11  0x00000000011589cb in github.com/spf13/cobra.(*Command).Execute
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:902
12  0x00000000011589cb in github.com/spf13/cobra.(*Command).ExecuteContext
    at /home/mathxh/go/pkg/mod/github.com/spf13/cobra@v1.3.0/command.go:895
13  0x00000000011589cb in github.com/cosmos/cosmos-sdk/server/cmd.Execute
    at /home/mathxh/go/pkg/mod/github.com/bianjieai/cosmos-sdk@v0.45.1-irita-20220816/server/cmd/execute.go:36
14  0x0000000001de0b0c in main.main
    at ./cmd/irita/main.go:21
15  0x000000000045afd2 in runtime.main
15  0x000000000045afd2 in runtime.main
    at /usr/local/go/src/runtime/proc.go:250
16  0x000000000048d1c1 in runtime.goexit
    at /usr/local/go/src/runtime/asm_amd64.s:1594
(dlv) list
> github.com/gsterjov/go-libsecret.(*Service).Open() /home/mathxh/go/pkg/mod/github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c/service.go:45 (hits goroutine(1):1 total:1) (PC: 0xd653ad)
Warning: debugging optimized function
    40: // OpenSession (IN String algorithm, IN Variant input, OUT Variant output, OUT ObjectPath result);
    41: func (service *Service) Open() (*Session, error) {
    42:   var output dbus.Variant
    43:   var path   dbus.ObjectPath
    44:
=>  45:   err := service.dbus.Call("org.freedesktop.Secret.Service.OpenSession", 0, "plain", dbus.MakeVariant("")).Store(&output, &path)
    46:   if err != nil {
    47:     return &Session{}, err
    48:   }
    49:
    50:   return NewSession(service.conn, path), nil
(dlv)

经过我用命令行的调试器不停尝试和分析。可以从上面看到,是在调用service.dbus.Call("org.freedesktop.Secret.Service.OpenSession", 0, "plain", dbus.MakeVariant("")).Store(&output, &path) 出现长时间阻塞的,而究其原因就是因为cosmos sdk默认的Keyring的backend是os。而os支持的后端有很多,比如libsecret kwallet,这个可以看文档。而这个函数就是在github.com/gsterjov/go-libsecret.(*Service).Open()里面,这里就是要通过打开的这个服务,来创建一个新的KeyRing。而这个DBUS通信及其慢,可能就是权限相关的原因。所以最终就是因为--keyring-backend os这个默认设置导致的。但是我的bash脚本里面的最开始已经设置了一个全局的irita config keyring-backend file 了,但是不起作用,这个设置居然只对irita version 有作用(顺便在这里吐槽一下,cosmos 运行version子命令的时候居然还会自动配置Keyring,所以就调用到阻塞函数的那里了)。所以最终我发现了在用keys相关的子命令是,必须强制加上backend,比如irita keys list --keyring-backend file 这样就好了,就不会被block住了。所以难怪我的bash脚本不起作用。

其实cosmos sdk的文档有描述了,但是真的没有太令人在意,而且这里面也有一些坑(就是irita config keyring-backend file不能全局起作用)。文档对于os的backend是这样说的:

The recommended backends for headless environments are file and pass.

就是说对于无桌面,无显示器的这样云服务器,还有WSL2这样的系统,最好就不要用默认的os了,要么用file, 要么用pass

总结

对所有的irita 的keys相关的操作 最好都强制加上--keyring-backend file的flag参数。这样就解决了。当然,irita config keyring-backend file 还是要设置的。所以可以知道了,默认的os backend,涉及到与DBUS的通信,普通权限会阻塞住很长时间,看来futex 的锁在等DBUS的response。而sudo权限,应该正好大开便利之门,“加快”了DBUS的通信。再细节的代码不深究了。到这里应该是差不多了。