1265578519 / OpenTracker

一个linux中开源和免费的BitTorrent Tracker
http://bbs.itzmx.com/thread-18214-1-1.html
74 stars 16 forks source link

kernel messages debug #4

Closed 1265578519 closed 11 months ago

1265578519 commented 11 months ago

运行十年了头一次崩溃,异常原因,猜测是硬件问题? opentracker[13006]: segfault at 7f6031449430 ip 0000000000405f3b sp 00007f60320b00f0 error 4 in opentracker[400000+14000]

1265578519 commented 11 months ago

opentracker[23302]: segfault at 7f08c34dc430 ip 0000000000405f3b sp 00007f08c89b40f0 error 4 in opentracker[400000+14000]

1265578519 commented 8 months ago

{0Y_M29DI{(T@L 2ON12$8J

WS(K@P~ `3`J)9U6F}N`8BC

https://bbs.itzmx.com/forum.php?mod=redirect&goto=findpost&ptid=18214&pid=303135&fromuid=1 想起来,是以前改过这个,贴着8KB极限走的,udp大了就崩 要是还崩溃我就降低下数值得了 (发现设置50一样有这种情况,可能是核心代码bug问题) [3086101.125185] opentracker[18869]: segfault at 7f25fd080010 ip 0000000000405ef3 sp 00007f2604ac30f0 error 4 in opentracker[400000+15000] [3107613.076988] opentracker[22339]: segfault at 7f3ce3a6e210 ip 0000000000405ef3 sp 00007f3cf0c580f0 error 4 in opentracker[400000+15000] [3145712.497406] opentracker[1404]: segfault at 7f59f971e8d0 ip 0000000000405f8b sp 00007f59fc4680f0 error 4 in opentracker[400000+15000] [3188615.385459] opentracker[10673]: segfault at 7f8e1a2b0330 ip 0000000000405f8b sp 00007f8e1c74f0f0 error 4 in opentracker[400000+15000] [3275337.146264] opentracker[26974]: segfault at 7fb568819230 ip 0000000000405ef3 sp 00007fb569e6e0f0 error 4 in opentracker[400000+15000] [3436286.589852] opentracker[6507]: segfault at 7f2bcb7b1ef0 ip 0000000000405ef3 sp 00007f2bd0d550f0 error 4 in opentracker[400000+15000] [3596379.775706] opentracker[9849]: segfault at 7f610cd9ad90 ip 0000000000405ef3 sp 00007f611e8200f0 error 4 in opentracker[400000+15000] [3602918.272569] opentracker[12778]: segfault at 7f7222367450 ip 0000000000405f8b sp 00007f722389e0f0 error 4 in opentracker[400000+15000]

1265578519 commented 6 months ago

[root@tracker ~]# dmesg | grep 'opentracker' [4476559.117264] opentracker[24773]: segfault at 7fd0a21a8090 ip 0000000000405e4b sp 00007fd0a0db30f0 error 4 in opentracker[400000+15000] [4498139.662202] opentracker[13312]: segfault at 7f93116fda50 ip 0000000000405db3 sp 00007f931253b0f0 error 4 in opentracker[400000+15000] [4512390.632152] opentracker[27419]: segfault at 7f7be1eede10 ip 0000000000405db3 sp 00007f7be0b620f0 error 4 in opentracker[400000+15000] [4518591.029451] opentracker[2603]: segfault at 7f8a6b1048b0 ip 0000000000405db3 sp 00007f8a69d660f0 error 4 in opentracker[400000+15000] [4554770.392193] opentracker[9359]: segfault at 7fcf72d98f90 ip 0000000000405db3 sp 00007fcf719bf0f0 error 4 in opentracker[400000+15000] A08BSZGD8L2F24_$VQS1 {C

2%ZO2XZ4 GAKTV4WVE_DS@9 错误信息基本只有软件作者才能看懂,或者作者自己也看不懂,c语言代码底层的问题,问作者 作者也没回,估计噶了

群友用IDA的结果 W_P3RPN7LIX8E9HC67QE%UG

F8U)8UW2)HW_{2)8(CI8~44

IW%4UJN)KL1623DX@QTY(}C

使用IDA F5后,大概发现是eax=0的时候没有到结尾 而是进入了数组循环 0在进入循环dex eax 把eax-到负数了 就会继续执行下去,int idx= amount -1,这个地方是idx=0时 跳到的 数组循环的结尾部分,idx初始化的时候 应该是0,应该判断小于0的时候 到结束 不能判断=0到结束 数组循环每次idx -1 在idx = amount -1上面 判断idx小于0 结束数组循环 到结尾 大概是这样

http://tracker1.itzmx.com:8080/stats?mode=top100 看起来是种子排行榜,要不然把这个页面禁掉试试?

编译后的二进制文件 opentracker.zip

崩溃前有个现象,stats页面卡住无响应,127.0.0.1访问都一直卡住没数据回应进不去,种子用户查询peer列表返回倒是正常,然后一段时间后就崩溃进程消失 感觉说不定就是这个top核心代码的问题

1265578519 commented 6 months ago

gdb /home/OpenTracker-master/opentracker/opentracker 15914

(gdb) gcore
warning: target file /proc/15914/cmdline contained unexpected null characters
Saved corefile core.15914
(gdb) bt
#0  0x00007f876237dd23 in sendto () from /lib64/libpthread.so.0
#1  0x000000000040bed5 in ?? ()
#2  0x000000000040bf44 in ?? ()
#3  0x00000000004086a8 in ?? ()
#4  0x00000000004024a7 in ?? ()
#5  0x00007f8761dad555 in __libc_start_main () from /lib64/libc.so.6
#6  0x0000000000402a75 in ?? ()
(gdb) t 0
Thread ID 0 not known.
(gdb) bt
#0  0x00007f876237dd23 in sendto () from /lib64/libpthread.so.0
#1  0x000000000040bed5 in ?? ()
#2  0x000000000040bf44 in ?? ()
#3  0x00000000004086a8 in ?? ()
#4  0x00000000004024a7 in ?? ()
#5  0x00007f8761dad555 in __libc_start_main () from /lib64/libc.so.6
#6  0x0000000000402a75 in ?? ()
(gdb) t 1
[Switching to thread 1 (Thread 0x7f87627a2740 (LWP 15914))]
#0  0x00007f876237dd23 in sendto () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f876237dd23 in sendto () from /lib64/libpthread.so.0
#1  0x000000000040bed5 in ?? ()
#2  0x000000000040bf44 in ?? ()
#3  0x00000000004086a8 in ?? ()
#4  0x00000000004024a7 in ?? ()
#5  0x00007f8761dad555 in __libc_start_main () from /lib64/libc.so.6
#6  0x0000000000402a75 in ?? ()
(gdb) t 2
[Switching to thread 2 (Thread 0x7f8761376700 (LWP 15916))]
#0  0x0000000000405c22 in ?? ()
(gdb) bt
#0  0x0000000000405c22 in ?? ()
#1  0x0000000000405fd7 in ?? ()
#2  0x00007f8762376ea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f8761e89b0d in clone () from /lib64/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0x7f8761b77700 (LWP 15915))]
#0  0x00007f8761e509fd in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f8761e509fd in nanosleep () from /lib64/libc.so.6
#1  0x00007f8761e812d4 in usleep () from /lib64/libc.so.6
#2  0x0000000000407eaf in ?? ()
#3  0x00007f8762376ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f8761e89b0d in clone () from /lib64/libc.so.6
(gdb)

image 捕获的gdb文件,由于限制25MB,所以分卷成5个,记得改名删了后缀名解压 core_2.zip.001.zip core_2.zip.002.zip core_2.zip.003.zip core_2.zip.004.zip core_2.zip.005.zip

/lib64/libc.so.6 我寻思源代码里也没这个文件

gdb调试进程 00405c22~00405fd7内为空 上方的dmesg | grep 'opentracker'输出 00405e4b 和 00405db3就在这个范围内

jpg

P`NU7NOGG@H_EUXTT8R0 KW 把sprintf这几个nop怎么样,不过看起来是执行这个函数前的错误

ai说是打印之前的错误,错误在idx 0 4%$4NHM717L MX2X`J@M3

1265578519 commented 6 months ago

询问ai修复结果,不知道对不对,毕竟ai经常胡说八道

您的代码片段是一段C语言编写的Opentracker统计代码。要修复“空指针错误”,我们通常需要确定代码在哪些地方可能会遇到指针未预期赋值为NULL的情况,然后再确定如何处理。

首先,从stats_top_txt函数中修复您之前提出的循环条件问题,并添加适当的空指针检查:

  1. 对于ot_vector *torrents_list可能为NULL需要检查。
  2. 应该检查ot_torrent及其peer_list是否为NULL

stats_top_txt函数中修复idx的循环:

for( bucket=0; bucket<OT_BUCKET_COUNT; ++bucket ) {
  ot_vector *torrents_list = mutex_bucket_lock( bucket );
  if (!torrents_list) {
    // 锁定bucket失败,处理错误或跳过该bucket
    continue;  
  }
  for( j=0; j<torrents_list->size; ++j ) {
    ot_peerlist *peer_list = ( ((ot_torrent*)(torrents_list->data))[j] ).peer_list;
    if (!peer_list) {
      // peer_list为空,处理错误或跳过该记录
      continue;
    }
    // 以下为原始代码,用于生成peer_count和seed_count的top列表
    // ...
  }
  mutex_bucket_unlock( bucket, 0 );
  if( !g_opentracker_running )
    return 0;
}

我们注意到函数mutex_bucket_lock的实现和外部变量OT_BUCKET_COUNTg_opentracker_running没有在代码片段中给出定义。需要确保这些操作能正确处理NULL值。

在处理过程中,还有一点需要注意的是代码中对amount的操作。如果amount为0,以下代码:

int idx = amount - 1;
while( (idx >= 0) && ( /* condition */ ) ) --idx;
if ( idx++ != amount - 1 ) { /* ... */ }

有可能导致idx变为-1,此时对top100c[idx + 1]top100s[idx + 1]的访问将超出数组界限。一定要保证在未定义行为发生之前,适当的检查已经完成。

最后,我提醒您,代码审查和测试是代码维护的重要环节,应该在实际部署之前,在多个层面上完成相应的检查。如果您还需要在其他函数中检查空指针或有其他问题,请继续提问。

1265578519 commented 6 months ago

群友发了个修复代码,他说:只改了这里,因为是iovec_increase函数返回的Null引起崩溃,而且stats_shift_down_network_count函数中的free函数在释放以后也没重置为Null会导致内存泄漏

/* 
原代码
static void stats_make( int *iovec_entries, struct iovec **iovector, ot_tasktype mode ) {
  char *r;

  *iovec_entries = 0;
  *iovector      = NULL;
  if( !( r = iovec_increase( iovec_entries, iovector, OT_STATS_TMPSIZE ) ) )
    return;

  switch( mode & TASK_TASK_MASK ) {
    case TASK_STATS_TORRENTS:    r += stats_torrents_mrtg( r );             break;
    case TASK_STATS_PEERS:       r += stats_peers_mrtg( r );                break;
    case TASK_STATS_SLASH24S:    r += stats_slash24s_txt( r, 128 );         break;
    case TASK_STATS_TOP10:       r += stats_top_txt( r, 10 );               break;
    case TASK_STATS_TOP100:
                                 r = iovec_fix_increase_or_free( iovec_entries, iovector, r, 4 * OT_STATS_TMPSIZE );
                                 if( !r ) return;
                                 r += stats_top_txt( r, 100 );              break;
    case TASK_STATS_EVERYTHING:  r += stats_return_everything( r );         break;
#ifdef WANT_SPOT_WOODPECKER
    case TASK_STATS_WOODPECKERS: r += stats_return_woodpeckers( r, 128 );   break;
#endif
#ifdef WANT_FULLLOG_NETWORKS
    case TASK_STATS_FULLLOG:      stats_return_fulllog( iovec_entries, iovector, r );
                                                                            return;
#endif
    default:
      iovec_free(iovec_entries, iovector);
      return;
  }
  iovec_fixlast( iovec_entries, iovector, r );
}
*/ 

/* 新代码修复stats界面空指针 */
static void stats_make(int *iovec_entries, struct iovec **iovector, ot_tasktype mode) {
    char *r;

    *iovec_entries = 0;
    *iovector = NULL;
    if (!(r = iovec_increase(iovec_entries, iovector, OT_STATS_TMPSIZE))) {
        // 内存分配失败,释放之前可能已分配的iovector内存并返回
        iovec_free(iovec_entries, iovector);
        return;
    }

    switch (mode & TASK_TASK_MASK) {
        case TASK_STATS_TORRENTS:
            r += stats_torrents_mrtg(r);
            break;
        case TASK_STATS_PEERS:
            r += stats_peers_mrtg(r);
            break;
        case TASK_STATS_SLASH24S:
            r += stats_slash24s_txt(r, 128);
            break;
        case TASK_STATS_TOP10:
            r += stats_top_txt(r, 10);
            break;
        case TASK_STATS_TOP100:
            r = iovec_fix_increase_or_free(iovec_entries, iovector, r, 4 * OT_STATS_TMPSIZE);
            if (!r) {
                // 内存重新分配失败的话,释放之前可能已分配的iovector内存并返回
                iovec_free(iovec_entries, iovector);
                return;
            }
            r += stats_top_txt(r, 100);
            break;
        case TASK_STATS_EVERYTHING:
            r += stats_return_everything(r);
            break;
#ifdef WANT_SPOT_WOODPECKER
        case TASK_STATS_WOODPECKERS:
            r += stats_return_woodpeckers(r, 128);
            break;
#endif
#ifdef WANT_FULLLOG_NETWORKS
        case TASK_STATS_FULLLOG:
            stats_return_fulllog(iovec_entries, iovector, r);
            // 无需返回,stats_return_fulllog 已经解决了释放内存
            return;
#endif
        default:
            // 处理未知类型,释放内存并返回
            iovec_free(iovec_entries, iovector);
            return;
    }
    iovec_fixlast(iovec_entries, iovector, r);
}
/* 新代码结束 */

实测这位群友发的修复代码没有用,还是出现崩溃现象,我感觉ai说的没错,应该是idx的问题,这位群友不知道修到哪去了

erdgeist commented 6 months ago

Hey,

I am the author of opentracker and am looking into the bug.

This fix here is not necessary:

            if (!r) {
                // 内存重新分配失败的话,释放之前可能已分配的iovector内存并返回
                iovec_free(iovec_entries, iovector);
                return;
            }

because iovec_fix_increase_or_free already frees the memory if reallocation fails.

Also idx in https://github.com/1265578519/OpenTracker/issues/4#issuecomment-1980813111 can never be 0, because the function stats_top_text will only be called from stats_make with amount either 10 or 100.

In order to see where the problem is, could you please execute opentracker.debug inside a gdb or lldb?

If the crash occurs, please attach a backtrace of the functions.

1265578519 commented 6 months ago

@erdgeist 我使用gdb的操作方法是否正确? 我在stats页面即将卡死的时候捕获了gcore文件(此时进程还没崩溃,我是否要等待到发生崩溃?)

[root@tracker ~]# cd /home/OpenTracker-master/opentracker
[root@tracker opentracker]# gdb --args ./opentracker.debug -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/OpenTracker-master/opentracker/opentracker.debug...done.
(gdb) run
Starting program: /home/OpenTracker-master/opentracker/./opentracker.debug -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Binding socket type TCP to address [::]:81... success.
Binding socket type UDP to address [::]:8080... success.
Binding socket type TCP to address [::]:6961... success.
Binding socket type UDP to address [::]:6961... success.
Binding socket type TCP to address [::]:2710... success.
Binding socket type UDP to address [::]:2710... success.
Dropping to user nobody.
[New Thread 0x7ffff73c7700 (LWP 13267)]
[New Thread 0x7ffff6bc6700 (LWP 13268)]
 installing 0 workers on udp socket -1

bt
^C
Program received signal SIGINT, Interrupt.
0x00007ffff76da0e3 in epoll_wait () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) bt
#0  0x00007ffff76da0e3 in epoll_wait () from /lib64/libc.so.6
#1  0x000000000040e368 in io_waituntil2 ()
#2  0x000000000040287a in server_mainloop (args=0x0) at opentracker.c:294
#3  0x0000000000403c42 in main (argc=15, argv=0x7fffffffe4a8) at opentracker.c:693
(gdb) t 0
Thread ID 0 not known.
(gdb) bt
#0  0x00007ffff76da0e3 in epoll_wait () from /lib64/libc.so.6
#1  0x000000000040e368 in io_waituntil2 ()
#2  0x000000000040287a in server_mainloop (args=0x0) at opentracker.c:294
#3  0x0000000000403c42 in main (argc=15, argv=0x7fffffffe4a8) at opentracker.c:693
(gdb) t 1
[Switching to thread 1 (Thread 0x7ffff7fed740 (LWP 13263))]
#0  0x00007ffff76da0e3 in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff76da0e3 in epoll_wait () from /lib64/libc.so.6
#1  0x000000000040e368 in io_waituntil2 ()
#2  0x000000000040287a in server_mainloop (args=0x0) at opentracker.c:294
#3  0x0000000000403c42 in main (argc=15, argv=0x7fffffffe4a8) at opentracker.c:693
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff73c7700 (LWP 13267))]
#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff76d12d4 in usleep () from /lib64/libc.so.6
#2  0x0000000000408ce6 in clean_worker (args=0x0) at ot_clean.c:123
#3  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0x7ffff6bc6700 (LWP 13268))]
#0  0x00000000004064c3 in stats_top_txt (reply=0x7ffff0005e30 "\330\a", amount=100) at ot_stats.c:315
315       int idx = amount - 1; while( (idx >= 0) && ( peer_list->peer_count > top100c[idx].val ) ) --idx;
(gdb) bt
#0  0x00000000004064c3 in stats_top_txt (reply=0x7ffff0005e30 "\330\a", amount=100) at ot_stats.c:315
#1  0x0000000000407705 in stats_make (iovec_entries=0x7ffff6bc5f04, iovector=0x7ffff6bc5ef8, 
    mode=TASK_STATS_TOP100) at ot_stats.c:620
#2  0x0000000000407b2b in stats_worker (args=0x0) at ot_stats.c:752
#3  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb) t 4
Thread ID 4 not known.
(gdb) gcore
warning: target file /proc/13263/cmdline contained unexpected null characters
Saved corefile core.13263
(gdb) quit
A debugging session is active.

    Inferior 1 [process 13263] will be killed.

Quit anyway? (y or n) y
[root@tracker opentracker]#

opentracker.debug二进制文件和gcore文件,由于限制25MB,所以分卷成3个,记得改名删了后缀名解压

core.13263.zip.001.zip core.13263.zip.002.zip core.13263.zip.003.zip

1265578519 commented 6 months ago

正在等待进程崩溃,明天我将更新崩溃后的gcore文件到issues

1265578519 commented 6 months ago

崩溃比预想来得更快,稍后编辑并且上传zip文件

[root@tracker opentracker]# gdb --args ./opentracker.debug -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/OpenTracker-master/opentracker/opentracker.debug...done.
(gdb) run
Starting program: /home/OpenTracker-master/opentracker/./opentracker.debug -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Binding socket type TCP to address [::]:81... success.
Binding socket type UDP to address [::]:8080... success.
Binding socket type TCP to address [::]:6961... success.
Binding socket type UDP to address [::]:6961... success.
Binding socket type TCP to address [::]:2710... success.
Binding socket type UDP to address [::]:2710... success.
Dropping to user nobody.
[New Thread 0x7ffff73c7700 (LWP 22282)]
[New Thread 0x7ffff6bc6700 (LWP 22283)]
 installing 0 workers on udp socket -1

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6bc6700 (LWP 22283)]
0x00000000004063a8 in to_hex (
    d=0x7ffff6bc51b1 "FB704F7612DB2F172433969A1CA054C95C61C84", 
    s=0x7ffff7ef6410 <Address 0x7ffff7ef6410 out of bounds>) at ot_stats.c:294
294 static char*to_hex(char*d,uint8_t*s){char*m="0123456789ABCDEF";char *t=d;char*e=d+40;while(d<e){*d++=m[*s>>4];*d++=m[*s++&15];}*d=0;return t;}
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) gcore
warning: target file /proc/22278/cmdline contained unexpected null characters
Saved corefile core.22278
(gdb) 

除了这个崩溃问题,我认为还需要解决stats的还有 1.增加一个超时,我不明白为什么会卡住五分钟服务器才响应结果,他应该60秒无响应504并且断开用户连接 2.为什么访问没有任何参数的stats,响应了top的页面? 1

opentracker.debug二进制文件和gcore文件,由于限制25MB,所以分卷成5个,记得改名删了后缀名解压

core.22278.zip.001.zip core.22278.zip.002.zip core.22278.zip.003.zip core.22278.zip.004.zip core.22278.zip.005.zip

刚刚发现崩溃太兴奋,忘记执行bt命令了,补上

[root@tracker ~]# gdb /home/OpenTracker-master/opentracker/opentracker.debug core*
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/OpenTracker-master/opentracker/opentracker.debug...done.

warning: core file may not match specified executable file.
[New LWP 22283]
[New LWP 22282]
[New LWP 22278]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/OpenTracker-master/opentracker/./opentracker.debug -f opentracker.conf.sam'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004063a8 in to_hex (d=0x7ffff6bc51b1 "FB704F7612DB2F172433969A1CA054C95C61C84", s=0x7ffff7ef6410 <Address 0x7ffff7ef6410 out of bounds>) at ot_stats.c:294
294 static char*to_hex(char*d,uint8_t*s){char*m="0123456789ABCDEF";char *t=d;char*e=d+40;while(d<e){*d++=m[*s>>4];*d++=m[*s++&15];}*d=0;return t;}
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) bt
#0  0x00000000004063a8 in to_hex (d=0x7ffff6bc51b1 "FB704F7612DB2F172433969A1CA054C95C61C84", s=0x7ffff7ef6410 <Address 0x7ffff7ef6410 out of bounds>) at ot_stats.c:294
#1  0x00000000004067e7 in stats_top_txt (
    reply=0x7ffff0021e00 "Top 100 torrents by peers:\n\t1243\tDE0066558293DC30E80C77AF657431A0C81F27BA\n\t1093\t0E10E2430B1888C20408BB6C4A9E577A227BFFD5\n\t1088\t7CDBAA3CF35B4A6EC194220B09D4373AC02CAF1B\n\t1027\tF14D81DC3C78642CF760534AD9"..., amount=100) at ot_stats.c:340
#2  0x0000000000407705 in stats_make (iovec_entries=0x7ffff6bc5f04, iovector=0x7ffff6bc5ef8, mode=TASK_STATS_TOP100) at ot_stats.c:620
#3  0x0000000000407b2b in stats_worker (args=0x0) at ot_stats.c:752
#4  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff73c7700 (LWP 22282))]
#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff76d12d4 in usleep () from /lib64/libc.so.6
#2  0x0000000000408ce6 in clean_worker (args=0x0) at ot_clean.c:123
#3  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0x7ffff7fed740 (LWP 22278))]
#0  0x0000000000405830 in mutex_workqueue_popresult (iovec_entries=0x7fffffffc2c4, iovec=0x7fffffffc2c8) at ot_mutex.c:241
241   while( *task && ( (*task)->tasktype != TASK_DONE ) )
(gdb) bt
#0  0x0000000000405830 in mutex_workqueue_popresult (iovec_entries=0x7fffffffc2c4, iovec=0x7fffffffc2c8) at ot_mutex.c:241
#1  0x0000000000402959 in server_mainloop (args=0x0) at opentracker.c:308
#2  0x0000000000403c42 in main (argc=15, argv=0x7fffffffe4a8) at opentracker.c:693
(gdb) 

为什么访问没有任何参数的stats,响应了top的页面?

可能就是因为这个问题,,导致了崩溃 输出了其它的内存数据

erdgeist commented 6 months ago

I identified a potential issue and fixed that in https://erdgeist.org/gitweb/opentracker/commit/?id=9c98e1e775c48684442fe97ca93bfa71b295d81e

I do not know how to merge from upstream master branch into this repository. But maybe you check it out.

1265578519 commented 6 months ago

@erdgeist 更新代码了,这个bug还在,,,等会看看崩不崩溃吧 不带参数访问可能卡出BUG输出top的内容 Y39I3BINKFIXD50%6QZBQXB 多次F5刷新就复现 http://49.12.76.8:2710/stats 我有预感,因为stats页面卡住,就是即将出现进程崩溃的前兆

如果发生崩溃我将补充gdb,目前更新后还没崩 就是CPU占用奇高,stats界面卡死,估计代码死循环了 估计stats有bug,导致CPU高了也有点影响announce

1265578519 commented 6 months ago

好消息是今天没有崩溃,坏消息是stats页面卡死 由于stats页面卡死导致进程CPU升高,引起announce一起响应变慢 卡死的时候gdb一下

Program received signal SIGINT, Interrupt.
0x0000000000405825 in mutex_workqueue_popresult (iovec_entries=0x7fffffffc2c4, iovec=0x7fffffffc2c8) at ot_mutex.c:241
241    while( *task && ( (*task)->tasktype != TASK_DONE ) )
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) bt
#0  0x0000000000405825 in mutex_workqueue_popresult (iovec_entries=0x7fffffffc2c4, iovec=0x7fffffffc2c8) at ot_mutex.c:241
#1  0x0000000000402967 in server_mainloop (args=0x0) at opentracker.c:311
#2  0x0000000000403c4b in main (argc=15, argv=0x7fffffffe4a8) at opentracker.c:702
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff73c7700 (LWP 31958))]
b#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff76d12d4 in usleep () from /lib64/libc.so.6
#2  0x0000000000408cef in clean_worker (args=0x0) at ot_clean.c:124
#3  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0x7ffff6bc6700 (LWP 31959))]
#0  0x00007ffff7bcd54d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcd54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7bc8e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007ffff7bc8d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000405744 in mutex_workqueue_pushresult (taskid=89113, iovec_entries=2, iovec=0x7ffff00d4d20) at ot_mutex.c:207
#4  0x0000000000407b49 in stats_worker (args=0x0) at ot_stats.c:757
#5  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb)

是不是这里问题,好像每次输出都不一样

#4  0x0000000000407b49 in stats_worker (args=0x0) at ot_stats.c:757

看起来要用更专业的profiler来分析CPU占用问题了

问了下ai的回复 image

这段代码在一个死循环中不断处理任务,并且没有可见的延迟或睡眠调用,这可能是导致高CPU占用的主要原因。在没有任务要处理时,这个线程将会不断轮询,导致它在每一次迭代中都尝试去获取任务,并且这个操作是CPU密集型的。

当这个线程被唤醒来处理新任务时,这是正常的;然而,在没有任务时,它应该等待,而不是在一个紧凑的循环中占用CPU资源。

要解决这个问题,可以考虑以下几个可能的方案:

  1. 使用条件变量:修改 mutex_workqueue_poptask() 函数,使其在任务队列为空时休眠线程,并在新任务到达时唤醒线程。这通常通过条件变量和互斥锁实现。

  2. 添加延迟或睡眠:如果条件变量不可用或不希望修改队列的实现,可以在循环中添加一个延迟(sleep, usleep, nanosleep)。这不是一个理想的解决方案,因为它引入了人为的延迟,可能会影响到任务处理的实时性。

  3. 使用事件驱动的设计:使用事件循环库(如libevent或libuv)来处理任务队列,只有在队列中实际有任务要处理时才唤醒线程。

  4. 使用消息队列:使用系统级的消息队列(如POSIX的消息队列),这些队列已经提供了阻塞线程直到消息到达的机制。

  5. 改进互斥和同步原语:如果mutex_workqueue_poptask使用的是自旋锁或者其他忙等待机制进行同步,换成阻塞式的同步原语可以减少CPU占用。

  6. 利用现有的并发库:很多编程环境有成熟的并发库可以处理这类问题,如C++的线程库、Java的ExecutorService等。

具体采取哪种方案取决于现有代码的其他部分和程序需要符合的特定要求。通常情况下,让一个工作线程在没有工作可做时进入休眠是解决此类问题最好的方法。

ida里面看到会发生死循环的,所以死循环还是卡死stats,但是修复了崩溃? 估计是因为这个死循环,导致没有参数的stats输出了top的内容 QN%UI20J%(MFX _0GVEN9DA $YX5TVX0A5}RYA7@OBYYC(A

erdgeist commented 6 months ago

With the latest commit

https://erdgeist.org/gitweb/opentracker/commit/?id=9f080415851246df48d9e906b4c7d7ee3b0dbefc

on upstream, the task list iterators should have been fixed.

Please try.

1265578519 commented 6 months ago

还是有相同的问题,虽然从IDA上看到没有死循环了,但是还发现会循环100次,高cpu占用和错误输出top100,卡死stats页面 BJSZZMMY7YR{U(F@6还有问题,虽然从IDA上看到没有死循环了,但是还发现会循环100次,高cpu占用和错误输出top100,卡死stats页面
_JI0

QKX04IOW8TAKOZC1J YK_~S

O9XTF_Z}HL(9VQWRCX LYMJ

N)L65A9DV0WMGBA93S1)`BG

卡出stats输出不带top参数的页面,很明显访问越界BUG了,stats不应该输出top的内容 群组成员直接修改二进制文件,jne循环改成nop,我观察一下他发的二进制文件是否能修复,使用他提供的二进制文件,现在CPU占用非常正常,我在观察一段时间 L}1$QS ZRIIJQ(2LX@V@LPE

5 ZK({%MKO28LO721 ~71G

1265578519 commented 6 months ago

群友改的二进制文件也有一样的问题,正在学习如何使用gprof CPU分析工具,不知道如何得到gmon.out文件 正在换perf,,等待生成perf.data文件

erdgeist commented 6 months ago

I do not think you need to measure performance. An infinite loop can not be optimized and should be fixed 😃

From your screenshots, one of the results for 49.12.76.8:2710/stats without parameters looks like it was with mode=top100. This is clearly wrong. Did you just reload the website?

Also did you call make clean after checking out the newer versions?

Maybe older .o object files were still in the directory?

1265578519 commented 6 months ago

@erdgeist http://49.12.76.8:2710/announce?info_hash=%11%11%11%11%11%11%11%11%11%11%11%11%11%11%11%11%11%11%11%11

http是正常工作的,只有stats不正常,我是rm -rf /home删除文件,并且killall -9 opentracker杀死进程,然后重新下载源代码编译安装,确保源代码是最新的

我就是在浏览器上使用F5发生stats错误显示top内容 image

erdgeist commented 6 months ago

Your output from /stats does not look like the default. Normally it should show something like:

opentracker serving 128757 torrents

So there are some changes that are not in my repository from

git clone git://erdgeist.org/opentracker

Can you please show me the diff between these directories?

1265578519 commented 6 months ago

一共发生6个文件更改,更改的内容在图上

image

image

image

image

image

image

我正在导出服务器上使用perf捕获的perf.data文件

1265578519 commented 6 months ago

安装

yum -y install perf

运行要分析的程序

perf record -g ./opentracker.debug -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710

读取报告,加载刚刚运行完成后生成在当前文件夹的perf.data文件

perf report

由于限制25MB,所以分卷成4个,记得改名删了后缀名解压 perf_2.zip.001.zip perf_2.zip.002.zip perf_2.zip.003.zip perf_2.zip.004.zip

我不确定这份报告文件是否有用,我正在加载perf.data,我之前没有使用过perf,正在学习用法 H(OCO1S V@YH)96%`Y}BSQQ

erdgeist commented 6 months ago

I would not recommend increasing the amount variable without also ensuring that there is enough space in the buffer for the reply to also hold 400 peers in tcp and udp replies.

If there is not enough space in the reply buffer, some memory will be overwritten.

Torrent clients usually do not connect to more than 50 peers, why did you increase the value to 400?

Also

numwant = ntohl( inpacket[92/4] );
if (numwant › 0) numwant = 400;

is bad. If the client only wants 10, you force the value to be 400. This is not good behaviour of a tracker.

1265578519 commented 6 months ago

缓冲区我看了是8KB,我相信他是足够的 ipv4模式时,单用户6字节 ipv4与ipv6模式时,单用户18字节 我在编译文件启用了#FEATURES+=-DWANT_V6,所以400*18=7200字节,<8000

同时我之前也使用过50默认值,一样发生stats相同的崩溃情况 https://github.com/1265578519/OpenTracker/issues/4#issuecomment-1879234458

(发现设置50一样有这种情况,可能是核心代码bug问题)

增加400个是一个“网络优化”,在中国大陆家庭宽带拥有公网ip并且设置端口转发来打开tcp端口的用户数量不到10%,所以只能返回更多的peer来提升用户BT客户端的传输速率

你这个问题也是解决stats问题后,我想和你讨论的另外两个新功能与修复 功能1. full scrape会产生ddos导致服务器高内存占用 比如10个人同时访问full scrape,会导致服务器占用10份进程内存的消耗 两种解决方案 1.不进行复制申请一份内存用于full scrape,直接返回scrape全部结果,此时不会产生内存消耗 2.只复制一份内存,10个人的其他9个人使用共同一份内存,避免高消耗内存,内存为原来的2倍

功能2. BT客户端通过udp tracker请求opentracker,opentracker在代码里判断获取用户远程访问的UDP端口覆盖用户请求汇报的自身监听端口,用于NAT1 UDP打洞,由于此时发起UDP的端口可以打开,能被其他人连接,创建一个#FEATURES+=-DWANT_NAT1UDP用于开关 目前实现的BT客户端有比特彗星(bitcomet),他是通过pex通知其它下载者自己的NAT1 UDP端口,我想让他在opentracker上实现,实现后可以支持utorrent、qbittorrent等等其他的客户端 image

我第一次使用perf,他读取打开report文件的速度非常慢 image

1265578519 commented 6 months ago

这是perf提供的性能报告,卡死stats无响应,服务器极高CPU占用的函数为stats_top_txt O26SLRA S{D`E{(UJQB86N

1265578519 commented 6 months ago

git clone git://erdgeist.org/opentracker FEATURES+=-DWANT_V6

我使用默认代码,只启用这一个功能,其他不做改动,接下来我要去睡觉了,稍后你会发现stats一样卡死无响应,输出错误top100的内容 image

image

还有一个新功能3 我需要FEATURES+=-DWANT_IP_FROM_PROXY对所有任意ip地址起效,现在只能-f opentracker.conf.sample 加载一个ip地址,因为需要通过cdn改善延迟访问速度,和cdn保护隐藏服务器ip地址避免被dmca tracker服务器 https://www.cloudflare.com/ips-v4 https://www.cloudflare.com/ips-v6

或者使用cloudflare以外的其它cdn服务商,所以DWANT_IP_FROM_PROXY要对所有ip地址起效果

由于只能在配置文件加载127.0.0.1,所以我只能使用一个http服务器反代81端口到8080端口,这一份cdn使用率是没有必要的,等于运行了两个高性能http服务器 目前世界上很多使用cdn的opentracker,在BT客户端上只能够获得127.0.0.1,因为他们opentracker服务器里面没有运行两个http服务器,导致无法传递真实ip地址

1265578519 commented 6 months ago

Mar 9 07:49:08 tracker opentracker: Error in `./opentracker': realloc(): invalid pointer: 0x00000000068d1820 刚刚可能发生了一个进程崩溃,我现在使用gdb来运行 opentracker.debug 捕获崩溃信息

erdgeist commented 6 months ago

I would recommend not using full scrapes at all and disable it in Makefile.

Better is to use /stats?mode=fscr&format=txt

There are multiple formats: "txt", "bin", "ben" (like the original), "url" and "txtp".

To avoid DoS, you can set access.stats_path stats in opentracker.conf to another value so only you can retrieve it. You can also set -DWANT_RESTRICT_STATS in Makefile and then set access.stats <YOUR IP>.

I do not like the idea of keeping a copy of fullscrape in memory, as it is huge and usually doubles the data. Also if I take a copy I must decide, when to make a new copy so that the fullscrape is not old.

If you have many interested clients in your fullscrape, you should use wget or curl to get a copy and then use a web server to distribute it.

I do not understand how NAT punching is supposed to work. opentracker already does what is in http://xbtt.sourceforge.net/udp_tracker_protocol.html

      event    = ntohl( inpacket[80/4] );
      port     = *(uint16_t*)( ((char*)inpacket) + 96 );
      ws->hash = (ot_hash*)( ((char*)inpacket) + 16 );

but during the udp connect packet, opentracker does not know the client's listening port. Client only sends listening port with announce packet, not with connect packet.

1265578519 commented 6 months ago

我知道刚刚为什么发生崩溃了,,, 因为我使用 git clone git://erdgeist.org/opentracker 忘记修改Makefile禁用full scrape,gdb提示的崩溃刚好在这,我在中国地区有1000w活跃在线peer用户,从错误信息看应该是内存用尽引起崩溃 image

image

[root@tracker opentracker]# gdb --args ./opentracker.debug -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/OpenTracker-master/opentracker/opentracker.debug...done.
(gdb) run
Starting program: /home/OpenTracker-master/opentracker/./opentracker.debug -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Binding socket type TCP to address [::]:81... success.
Binding socket type UDP to address [::]:8080... success.
Binding socket type TCP to address [::]:6961... success.
Binding socket type UDP to address [::]:6961... success.
Binding socket type TCP to address [::]:2710... success.
Binding socket type UDP to address [::]:2710... success.
Dropping to user nobody.
[New Thread 0x7ffff73c7700 (LWP 31843)]
[New Thread 0x7ffff6bc6700 (LWP 31844)]
[New Thread 0x7ffff63c5700 (LWP 31845)]
 installing 0 workers on udp socket -1
[00000029] scrp:  127.0.0.1 - FULL SCRAPE

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7660c84 in realloc () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 zlib-1.2.7-21.el7_9.x86_64
(gdb) bt
#0  0x00007ffff7660c84 in realloc () from /lib64/libc.so.6
#1  0x000000000040dacf in array_allocate ()
#2  0x000000000040fefb in iob_addbuf_internal ()
#3  0x000000000040f552 in iob_addbuf_free ()
#4  0x000000000040a8d5 in http_sendiovecdata (sock=398, ws=0x7fffffffc2d0, iovec_entries=95, iovector=0x7fffe80008c0) at ot_http.c:183
#5  0x000000000040294e in server_mainloop (args=0x0) at opentracker.c:312
#6  0x0000000000403bc8 in main (argc=15, argv=0x7fffffffe4a8) at opentracker.c:702
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff73c7700 (LWP 31843))]
#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff76a09fd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff76d12d4 in usleep () from /lib64/libc.so.6
#2  0x0000000000408c10 in clean_worker (args=0x0) at ot_clean.c:124
#3  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0x7ffff6bc6700 (LWP 31844))]
#0  0x00007ffff7bcaa35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcaa35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004055d5 in mutex_workqueue_poptask (tasktype=0x7ffff6bc5ef4) at ot_mutex.c:146
#2  0x000000000040982f in fullscrape_worker (args=0x0) at ot_fullscrape.c:59
#3  0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff76d9b0d in clone () from /lib64/libc.so.6
(gdb) 

先回到原来的话题,其它功能后续在讨论,现在知道占用高cpu为stats_top_txt,导致卡死stats和输出错误的top内容 有没有办法优化解决这个问题,也可以保证stats访问,禁用stats?mode=top100,这样不会执行到stats_top_txt

1265578519 commented 6 months ago
static void stats_make( int *iovec_entries, struct iovec **iovector, ot_tasktype mode ) {
  char *r;

  *iovec_entries = 0;
  *iovector      = NULL;
  if( !( r = iovec_increase( iovec_entries, iovector, OT_STATS_TMPSIZE ) ) )
    return;

  switch( mode & TASK_TASK_MASK ) {
    case TASK_STATS_TORRENTS:    r += stats_torrents_mrtg( r );             break;
    case TASK_STATS_PEERS:       r += stats_peers_mrtg( r );                break;
    case TASK_STATS_SLASH24S:    r += stats_slash24s_txt( r, 128 );         break;
    case TASK_STATS_TOP10:       r += stats_top_txt( r, 10 );               break;
    case TASK_STATS_TOP100:
                                 r = iovec_fix_increase_or_free( iovec_entries, iovector, r, 4 * OT_STATS_TMPSIZE );
                                 if( !r ) return;
                                 r += stats_top_txt( r, 100 );              break;
    case TASK_STATS_EVERYTHING:  r += stats_return_everything( r );         break;

我增加一行

fprintf( stderr, "stats mode: %d\n", mode );
static void stats_make( int *iovec_entries, struct iovec **iovector, ot_tasktype mode ) {
  char *r;

  *iovec_entries = 0;
  *iovector      = NULL;
  if( !( r = iovec_increase( iovec_entries, iovector, OT_STATS_TMPSIZE ) ) )
    return;
fprintf( stderr, "stats mode: %d\n", mode );
  switch( mode & TASK_TASK_MASK ) {
    case TASK_STATS_TORRENTS:    r += stats_torrents_mrtg( r );             break;
    case TASK_STATS_PEERS:       r += stats_peers_mrtg( r );                break;
    case TASK_STATS_SLASH24S:    r += stats_slash24s_txt( r, 128 );         break;
    case TASK_STATS_TOP10:       r += stats_top_txt( r, 10 );               break;
    case TASK_STATS_TOP100:
                                 r = iovec_fix_increase_or_free( iovec_entries, iovector, r, 4 * OT_STATS_TMPSIZE );
                                 if( !r ) return;
                                 r += stats_top_txt( r, 100 );              break;
    case TASK_STATS_EVERYTHING:  r += stats_return_everything( r );         break;

top100访问频率蛮大的,不知道有没有办法优化,没办法优化的话就给个选项关闭top吧,不提供top给其他人访问了 top100是261,stats是258 image

image

image

估计stats和top是同一个线程,所以导致stats一起打不开,或者错误输出top的内容,这么高的cpu占用,不如弄个开关把top关闭吧?

1265578519 commented 6 months ago

我能想得到的一种优化方案,把top的内容放入内存缓存,每1小时更新一次,用户访问top后直接输出高速缓存,不需要执行stats_top_txt

1265578519 commented 6 months ago

image 我发现为什么最近有那么多top请求了,,,有人把top链接当tracker用了 应该判断top页面,有多余的参数就禁止访问

171.212.255.123 175.35.64.134 183.238.28.186 随便翻了一下就看见有三个人拿top列表来当tracker用 没抓全,随便抓的几个,171.212.255.123 这个人访问的最凶,估计挂了几百个种子

这样的链接,应该禁止访问,输出Not Found,避免cpu性能消耗 http://49.12.76.8:2710/stats?mode=top100&info_hash=%3E6%FFS%7C%9D.%D4l%EC%26%B9%FDPI%3F%13%0B%A7%FF&peer_id=-BC0206-%BA%A5%7D%E0%E6y%40%17%FAO%C5%B0&port=22223&natmapped=1&localip=192.168.0.161&port_type=lan&uploaded=0&downloaded=0&left=0&numwant=200&compact=1&no_peer_id=1&key=56251&event=started

erdgeist commented 6 months ago

Upstream master now refuses to calculate expensive stats when an info_hash parameter is present

https://erdgeist.org/gitweb/opentracker/commit/?id=6604d65779796f2df6bd52840bc2b2e3f9f765b3

1265578519 commented 6 months ago

很好的修复!现在800w活跃在线 image

cpu使用率从100%占用变成了40%,下降了60%不必要的cpu浪费 image

image

现在stats页面也没有发生卡顿无响应,不会发生错误输出top100的内容了 没想到千万级别的访问量,之前会被这小小的流量,也就每分钟几十个请求给弄到进程崩溃

1265578519 commented 6 months ago

https://github.com/1265578519/OpenTracker/issues/4#issuecomment-1986568068

还有一个新功能3 我需要FEATURES+=-DWANT_IP_FROM_PROXY对所有任意ip地址起效,现在只能-f opentracker.conf.sample 加载一个ip地址,因为需要通过cdn改善延迟访问速度,和cdn保护隐藏服务器ip地址避免被dmca tracker服务器 https://www.cloudflare.com/ips-v4 https://www.cloudflare.com/ips-v6

或者使用cloudflare以外的其它cdn服务商,所以DWANT_IP_FROM_PROXY要对所有ip地址起效果

由于只能在配置文件加载127.0.0.1,所以我只能使用一个http服务器反代81端口到8080端口,这一份cdn使用率是没有必要的,等于运行了两个高性能http服务器 目前世界上很多使用cdn的opentracker,在BT客户端上只能够获得127.0.0.1,因为他们opentracker服务器里面没有运行两个http服务器,导致无法传递真实ip地址

接下来可以做这个,-f使用配置文件是没有必要的,启用功能编译后,直接对所有ip地址起效果来传递真实ip

对于在opentracker服务器上实现NAT1 UDP打洞,我正在联系比特彗星(bitcomet)开发者,询问他能不能协助写一段代码

erdgeist commented 6 months ago

I would be very interested in the results of another run of perf to see if modern compilers have changes the hot spots.

Changing the Makefile like this should add debug symbols to the optimised objects and not calling strip should leave them in the binary:

diff --git a/Makefile b/Makefile
index e3301a5..f801aaa 100644
--- a/Makefile
+++ b/Makefile
@@ -48,7 +48,7 @@ FEATURES+=-DWANT_FULLSCRAPE
 #FEATURES+=-D_DEBUG_HTTPERROR

 OPTS_debug=-D_DEBUG -g -ggdb # -pg -fprofile-arcs -ftest-coverage
-OPTS_production=-O3
+OPTS_production=-O3 -g -ggdb

 CFLAGS+=-I$(LIBOWFAT_HEADERS) -Wall -pipe -Wextra #-ansi -pedantic
 LDFLAGS+=-L$(LIBOWFAT_LIBRARY) -lowfat -pthread -lpthread -lz
@@ -73,7 +73,6 @@ CFLAGS_debug = $(CFLAGS) $(OPTS_debug) $(FEATURES)

 $(BINARY): $(OBJECTS) $(HEADERS)
        $(CC) -o $@ $(OBJECTS) $(LDFLAGS)
-       $(STRIP) $@
 $(BINARY).debug: $(OBJECTS_debug) $(HEADERS)
        $(CC) -o $@ $(OBJECTS_debug) $(LDFLAGS)
 proxy: $(OBJECTS_proxy) $(HEADERS)

Could you please

1) apply this patch 2) do a make clean 3) do a make 4) start perf record -g ./opentracker -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710

and then after 45 minutes send my perf.data and the opentracker binary with the symbols?

Then I can see if there is more room for optimising.

Thank you.

erdgeist commented 6 months ago

还有一个新功能3 我需要FEATURES+=-DWANT_IP_FROM_PROXY对所有任意ip地址起效,现在只能-f opentracker.conf.sample 加载一个ip地址,因为需要通过cdn改善延迟访问速度,和cdn保护隐藏服务器ip地址避免被dmca tracker服务器 https://www.cloudflare.com/ips-v4 https://www.cloudflare.com/ips-v6

或者使用cloudflare以外的其它cdn服务商,所以DWANT_IP_FROM_PROXY要对所有ip地址起效果

由于只能在配置文件加载127.0.0.1,所以我只能使用一个http服务器反代81端口到8080端口,这一份cdn使用率是没有必要的,等于运行了两个高性能http服务器 目前世界上很多使用cdn的opentracker,在BT客户端上只能够获得127.0.0.1,因为他们opentracker服务器里面没有运行两个http服务器,导致无法传递真实ip地址

This was lost in translation. (I am German, I don't speak Chinese and need a translation service to read your comments ;).

In order to help me to better understand your idea:

1) Are you running the tracker on your own server or do you rent one? 2) Does this server only have one IP address or multiple? 3) Is there a webserver (nginx or apache) running on the same IP address as the tracker? 4) Does the webserver act as a reverse proxy to server both your website and the tracker? 5) Do you use cloudflare as DoS protection or to disguise the tracker's IP address? 6) Do your reverse proxy addresses change so often that you can not put them in the config file?

It is not a good idea to allow every peer to set their IP address from the URL or from HTTP x-forwarded-for headers. This is because an attacker can cause a DDoS attack on any IP address by doing an /announce?ip=VICTIM&port=RANDOM on every popular torrent. This will result in thousands or even millions of peers make a connection to the victim.

The current way to set a proxy server is using access.proxy in your opentracker.conf file. You can add multiple proxies.

It might be better to allow for something like /config?mode=add-reverse-proxy&ip=23.42.5.1 on opentracker so you can add more reverse proxy addresses if they change?

1265578519 commented 6 months ago

这是我现在使用的Makefile,45分钟后我将发送 perf.data 和二进制文件 perf record -g ./opentracker -f opentracker.conf.sample -p 81 -P 8080 -p 6961 -P 6961 -p 2710 -P 2710

# $Id$

CC?=gcc

# Linux flavour
# PREFIX?=/opt/diet
# LIBOWFAT_HEADERS=$(PREFIX)/include
# LIBOWFAT_LIBRARY=$(PREFIX)/lib

# BSD flavour
# PREFIX?=/usr/local
# LIBOWFAT_HEADERS=$(PREFIX)/include/libowfat
# LIBOWFAT_LIBRARY=$(PREFIX)/lib

# Debug flavour
PREFIX?=..
LIBOWFAT_HEADERS=$(PREFIX)/libowfat
LIBOWFAT_LIBRARY=$(PREFIX)/libowfat

BINDIR?=$(PREFIX)/bin
STRIP?=strip

FEATURES+=-DWANT_V6

#FEATURES+=-DWANT_ACCESSLIST_BLACK
#FEATURES+=-DWANT_ACCESSLIST_WHITE
#FEATURES+=-DWANT_DYNAMIC_ACCESSLIST

#FEATURES+=-DWANT_SYNC_LIVE
#FEATURES+=-DWANT_IP_FROM_QUERY_STRING
#FEATURES+=-DWANT_COMPRESSION_GZIP
#FEATURES+=-DWANT_COMPRESSION_GZIP_ALWAYS
#FEATURES+=-DWANT_LOG_NETWORKS
#FEATURES+=-DWANT_RESTRICT_STATS
FEATURES+=-DWANT_IP_FROM_PROXY
#FEATURES+=-DWANT_FULLLOG_NETWORKS
#FEATURES+=-DWANT_LOG_NUMWANT
#FEATURES+=-DWANT_MODEST_FULLSCRAPES
#FEATURES+=-DWANT_SPOT_WOODPECKER
#FEATURES+=-DWANT_SYSLOGS
#FEATURES+=-DWANT_DEV_RANDOM
#FEATURES+=-DWANT_FULLSCRAPE

# Is enabled on BSD systems by default in trackerlogic.h
# on Linux systems you will need -lbds
#FEATURES+=-DWANT_ARC4RANDOM

#FEATURES+=-D_DEBUG_HTTPERROR

OPTS_debug=-D_DEBUG -g -ggdb # -pg -fprofile-arcs -ftest-coverage
OPTS_production=-O3 -g -ggdb

CFLAGS+=-I$(LIBOWFAT_HEADERS) -Wall -pipe -Wextra #-ansi -pedantic
LDFLAGS+=-L$(LIBOWFAT_LIBRARY) -lowfat -pthread -lpthread -lz
#LDFLAGS+=-lbsd

BINARY =opentracker
HEADERS=trackerlogic.h scan_urlencoded_query.h ot_mutex.h ot_stats.h ot_vector.h ot_clean.h ot_udp.h ot_iovec.h ot_fullscrape.h ot_accesslist.h ot_http.h ot_livesync.h ot_rijndael.h
SOURCES=opentracker.c trackerlogic.c scan_urlencoded_query.c ot_mutex.c ot_stats.c ot_vector.c ot_clean.c ot_udp.c ot_iovec.c ot_fullscrape.c ot_accesslist.c ot_http.c ot_livesync.c ot_rijndael.c
SOURCES_proxy=proxy.c ot_vector.c ot_mutex.c

OBJECTS = $(SOURCES:%.c=%.o)
OBJECTS_debug = $(SOURCES:%.c=%.debug.o)
OBJECTS_proxy = $(SOURCES_proxy:%.c=%.o)
OBJECTS_proxy_debug = $(SOURCES_proxy:%.c=%.debug.o)

.SUFFIXES: .debug.o .o .c

all: $(BINARY) $(BINARY).debug

CFLAGS_production = $(CFLAGS) $(OPTS_production) $(FEATURES)
CFLAGS_debug = $(CFLAGS) $(OPTS_debug) $(FEATURES)

$(BINARY): $(OBJECTS) $(HEADERS)
    $(CC) -o $@ $(OBJECTS) $(LDFLAGS)
$(BINARY).debug: $(OBJECTS_debug) $(HEADERS)
    $(CC) -o $@ $(OBJECTS_debug) $(LDFLAGS)
proxy: $(OBJECTS_proxy) $(HEADERS)
    $(CC) -o $@ $(OBJECTS_proxy) $(CFLAGS_production) $(LDFLAGS)
proxy.debug: $(OBJECTS_proxy_debug) $(HEADERS)
    $(CC) -o $@ $(OBJECTS_proxy_debug) $(LDFLAGS)

.c.debug.o : $(HEADERS)
    $(CC) -c -o $@ $(CFLAGS_debug) $(<:.debug.o=.c)

.c.o : $(HEADERS)
    $(CC) -c -o $@ $(CFLAGS_production) $<

clean:
    rm -rf opentracker opentracker.debug *.o *~

install:
    install -m 755 opentracker $(DESTDIR)$(BINDIR)
1265578519 commented 6 months ago

我刚刚发现一个bug,服务器在启用ipv6的时候 FEATURES+=-DWANT_V6

先对ipv6发起udp请求,随后在BT客户端中删除tracker,添加一个新的ipv4 udp,此时会ipv4的udp会响应错误的peer udp://[2a01:4f8:c012:8025::]:2710/announce udp://49.12.76.8:2710/announce

$W@AUE$H1SZ0UF% @U(~HPR

1265578519 commented 6 months ago

我觉得这已经是极限性能了,没有什么优化空间了,800w用户在线,占用40%的cpu是理想的 opentracker二进制文件和perf.data文件,由于限制25MB,所以分卷成2个,记得改名删了后缀名解压

Desktop_2.zip.001.zip Desktop_2.zip.002.zip

1265578519 commented 6 months ago

This was lost in translation. (I am German, I don't speak Chinese and need a translation service to read your comments ;).

正如同我不会英文一样,如果我使用翻译后的英文和你沟通,那将会发生更多的词汇错误

  • Are you running the tracker on your own server or do you rent one?

我有4台自己的vps服务器运行opentracker,分别在hetzner、vultr、contabo、aliyun

  • Does this server only have one IP address or multiple?

每台服务器拥有一个ipv4地址与ipv6地址

  • Is there a webserver (nginx or apache) running on the same IP address as the tracker?

这就是我之前说的,运行了一个http服务器(nginx ),会导致服务器双倍cpu消耗

  • Does the webserver act as a reverse proxy to server both your website and the tracker?

运行的http服务器作为反向代理只对tacker传递用户真实ip地址,网站运行在另一个其它服务器,和tracker不在相同的服务器上

  • Do you use cloudflare as DoS protection or to disguise the tracker's IP address?

是的,我的第1台服务器使用cloudflare,当然我也可能选择其它CDN,所以我才说要对所有ip来获取真实ip,而不是opentracker.conf.sample里的那一个127.0.0.1

6. Do your reverse proxy addresses change so often that you can not put them in the config file?

这是cloudflare的ip地址更改记录,每年发生一次变化 https://www.cloudflare.com/ips/ image 当然我可能不使用cloudflare,去使用aliyun cdn,所以CDN提供的反向代理的ip地址是不固定的,CDN服务商可能经常发生ip变化,而且CDN服务商不一定和cloudflare一样,并不会提供完整的反向代理ip地址列表

It is not a good idea to allow every peer to set their IP address from the URL or from HTTP x-forwarded-for headers.

我觉得你的担心是多余的,X-Forwarded-For不安全大家都知道,可以使用其它http头来获取真实ip,他是永远无法伪造的header,比如X-Real-Ip,CF-Connecting-IP,cloudflare会提供正确的ip信息给 opentracker,cloudflare也会修复X-Forwarded-For使其变为安全 https://support.cloudflare.com/hc/zh-cn/articles/200170786 这是另一个其它程序的ip获取代码,我觉得能参考它,依次获取,优先获取无法伪造,受信任的X-Real-Ip

    private function _get_client_ip() {
        $ip = $_SERVER['REMOTE_ADDR'];
        if (!array_key_exists('security', $this->config) || !$this->config['security']['onlyremoteaddr']) {
            if (array_key_exists('ipgetter', $this->config) && !empty($this->config['ipgetter']['setting'])) {
                $s = empty($this->config['ipgetter'][$this->config['ipgetter']['setting']]) ? array() : $this->config['ipgetter'][$this->config['ipgetter']['setting']];
                $c = 'ip_getter_'.$this->config['ipgetter']['setting'];
                $r = $c::get($s);
                $ip = ip::validate_ip($r) ? $r : $ip;
            } elseif (isset($_SERVER['HTTP_CLIENT_IP']) && ip::validate_ip($_SERVER['HTTP_CLIENT_IP'])) {
                $ip = $_SERVER['HTTP_CLIENT_IP'];
            } elseif(isset($_SERVER['HTTP_X_FORWARDED_FOR'])) {
                if (strpos($_SERVER['HTTP_X_FORWARDED_FOR'], ",") > 0) {
                    $exp = explode(",", $_SERVER['HTTP_X_FORWARDED_FOR']);
                    $ip = ip::validate_ip(trim($exp[0])) ? $exp[0] : $ip;
                } else {
                    $ip = ip::validate_ip($_SERVER['HTTP_X_FORWARDED_FOR']) ? $_SERVER['HTTP_X_FORWARDED_FOR'] : $ip;
                }
            }
        }
        return $ip;
    }

It might be better to allow for something like /config?mode=add-reverse-proxy&ip=23.42.5.1 on opentracker so you can add more reverse proxy addresses if they change?

只有cloudflare提供了完整的ip地址。。。除非我只使用cloudflare http:://ip/config 方式添加我还觉得不如配置文件去添加多个ip地址,但是我想对所有ip地址起效果,而不是白名单

就是不需要在本地运行一份反向代理服务器,这样会导致双份CPU消耗,现在的问题是不使用反向代理无法获取到真实的ip地址 用户→CDN→反向代理→opentracker 用户→CDN→opentracker

1265578519 commented 6 months ago

@erdgeist 如果你不愿意对所有ip地址起效果,你可以支持CIDR写法?104.16.0.0/13 = 524288 个 IP 地址,我不可能要去配置文件添加 access.proxy 几百万个ip地址吧? image

还有个udp响应peer数量错误的问题,是否要修复? https://github.com/1265578519/OpenTracker/issues/4#issuecomment-1987243416

erdgeist commented 5 months ago

I just merged https://erdgeist.org/gitweb/opentracker/log/?h=blessed-networks to master. This should allow you to bless networks using CIDR notation on v4 and v6 addresses.

About the second point: What is wrong with peer count in udp?

1265578519 commented 5 months ago

I just merged https://erdgeist.org/gitweb/opentracker/log/?h=blessed-networks to master. This should allow you to bless networks using CIDR notation on v4 and v6 addresses.

太棒了,测试v4和v6都可以完美正常工作,成功获取cloudflare用户真实ip 1u4nf11f6hz0ki8y29bx3yvt843irqb

# IIc)
#      If opentracker lives behind one or multiple reverse proxies, you can
#      every http connection appears to come from these proxies. In order to
#      take the X-Forwarded-For address instead, compile opentracker with the
#      WANT_IP_FROM_PROXY option and set your proxy addresses here.
#
# access.proxy 10.0.1.23
# access.proxy 10.0.1.24
#
access.proxy 173.245.48.0/20
access.proxy 103.21.244.0/22
access.proxy 103.22.200.0/22
access.proxy 103.31.4.0/22
access.proxy 141.101.64.0/18
access.proxy 108.162.192.0/18
access.proxy 190.93.240.0/20
access.proxy 188.114.96.0/20
access.proxy 197.234.240.0/22
access.proxy 198.41.128.0/17
access.proxy 162.158.0.0/15
access.proxy 104.16.0.0/13
access.proxy 104.24.0.0/14
access.proxy 172.64.0.0/13
access.proxy 131.0.72.0/22
access.proxy 2400:cb00::/32
access.proxy 2606:4700::/32
access.proxy 2803:f800::/32
access.proxy 2405:b500::/32
access.proxy 2405:8100::/32
access.proxy 2a06:98c0::/29
access.proxy 2c0f:f248::/32

About the second point: What is wrong with peer count in udp?

我上传了一个视频,不知道你能不能看懂,通过udp无法正确获取用户ip地址 https://github.com/1265578519/OpenTracker/assets/6442439/3e6c62b1-be18-4d7e-a014-d54cf097c7a8

Makefile启用ipv6 FEATURES+=-DWANT_V6 首先添加tracker服务器ipv6 announce,在添加ipv4 announce,他获得了几个错误的peer

这个问题只在udp announce出现错误 http正常工作

erdgeist commented 5 months ago

Hmm. How do you run a single opentracker on both ipv6 and ipv4? This should not be possible. How did you invoke opentracker?

You need two opentracker processes: one running on an ivp4 address, serving ipv4 peers and one running an an ipv6 address, serving ipv6 peers.

The protocol spec https://www.bittorrent.org/beps/bep_0015.html says that there are two formats, one with ipv4 replies and one with wide ones. I guess you are seeing ipv6 replies in your ipv4 request and your client thinks, they're just many ipv4 addresses.

Maybe I should allow for opentracker to run on both ipv4 and ipv6 and manage two lists.

1265578519 commented 5 months ago

我需要的是udp announce获得和http announce一样的效果,http它可以完美进行工作,只访问ipv4或者只访问ipv6,都可以找到对方全部所有的peer A和B两台电脑测试,B可以直接获得A的ipv4和ipv6全部地址用于工作

https://github.com/1265578519/OpenTracker/assets/6442439/1568412c-3ca0-43f6-9554-555f930ed427

我希望udp announce也能做到和http拥有相同的效果,而不是访问ipv4只返回ipv4的用户,列表不能够分开运行,要把peer全部聚合在一起返回

至少现在http announce的ipv4和ipv6和视频演示一样,是完美工作的,对于http announce的ipv4 ipv6我很满意

1265578519 commented 5 months ago

这是BT客户端的问题?如果确定是BT客户端问题,我可以联系BT客户端开发者更新,opentracker就不需要改动代码

例子 BT客户端请求ipv4 udp announce应当检查返回结果,如果返回内容是ipv6格式,应当匹配检查返回内容

erdgeist commented 5 months ago

I have an idea: If a client is using your cloud proxy, the connection lands on the wrong tracker: A peer connecting to cloudflare via ipv4 may be forwarded to opentracker over an ipv6 ip address, but the ipv6 tracker is not capable of storing ipv4 addresses.

I think I need to implement capabilities in opentracker to handle both ipv4 and ipv6 clients.

1265578519 commented 5 months ago

这个修改需要只对udp起效,http的不要修改,http是正常工作的 等你修改完成,我测试一下修改udp后的代码是否满意

1265578519 commented 5 months ago

上面的http正常工作视频显示的104.28.211.192-104.28.211.194 是正常的,他是我的本地电脑ip地址,我测试过程中使用了warp vpn https://1.1.1.1 来获得ipv6访问tracker,因为我当时用于测试的,本地电脑使用的运营商没有ipv6

I have an idea: If a client is using your cloud proxy, the connection lands on the wrong tracker: A peer connecting to cloudflare via ipv4 may be forwarded to opentracker over an ipv6 ip address, but the ipv6 tracker is not capable of storing ipv4 addresses.

我想我可能理解错了?也可能是翻译问题,我有点没看明白这段话

image

erdgeist commented 5 months ago

I reworked opentracker to properly (hopefully) work in dual stack mode here https://erdgeist.org/gitweb/opentracker/commit/?id=2afc4893bf802700a1decfff57673cefc861c7e7

This is in master now. opentracker is now able to bind to ipv4 and ipv6 addresses and server replies to ipv4 users and ipv6 users depending on their source address – or if they are behind a proxy, the address that the proxy tells opentracker, even if it is an ipv4 address proxied via ipv6 or vice versa.

Can you check if your problems with udp go away?