alibaba / x-deeplearning

An industrial deep learning framework for high-dimension sparse data
Apache License 2.0
4.26k stars 1.03k forks source link

seastar启动的时候coredump #227

Closed zhanglistar closed 5 years ago

zhanglistar commented 5 years ago

(gdb) bt

0 std::_Head_base<0ul, io_event, false>::_Head_base<io_event&> (__h=..., this=0x8) at /usr/include/c++/5/tuple:115

1 std::_Tuple_impl<0ul, io_event>::_Tuple_impl<io_event&> (__head=..., this=0x8) at /usr/include/c++/5/tuple:362

2 std::tuple::tuple<io_event&, void> (this=0x8) at /usr/include/c++/5/tuple:480

3 seastar::future_state::set<io_event&> (this=0x0) at core/future.hh:210

4 seastar::promise::set_value<io_event&> (this=0x7f8bee9ee700) at core/future.hh:475

5 seastar::reactor::process_io (this=0x7f8bee81b000) at core/reactor.cc:1037

6 0x000000000082c42d in seastar::reactor::poll_once (this=) at core/reactor.cc:3351

7 0x000000000082c44c in seastar::reactor::<lambda()>::operator() (__closure=0x7f8bf03f5b60) at core/reactor.cc:3184

8 std::_Function_handler<bool(), seastar::reactor::run()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/5/functional:1857

9 0x0000000000885533 in std::function<bool ()>::operator()() const (this=0x7f8bf03f5b60) at /usr/include/c++/5/functional:2267

10 seastar::reactor::run (this=0x7f8bee81b000) at core/reactor.cc:3210

11 0x000000000088af88 in seastar::smp::<lambda()>::operator()(void) const (__closure=0x7f8bf0c1a280) at core/reactor.cc:4245

12 0x00000000008d9c1e in std::function<void ()>::operator()() const (this=) at /usr/include/c++/5/functional:2267

13 seastar::posix_thread::start_routine (arg=) at core/posix.cc:52

14 0x00007f8c83c6f6ba in start_thread (arg=0x7f8bf03ff700) at pthread_create.c:333

15 0x00007f8c8369c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1080: Client environment:zookeeper.version=zookeeper C client 3.6.0 2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1084: Client environment:host.name=85174fd2099c 2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1091: Client environment:os.name=Linux 2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1092: Client environment:os.arch=4.9.70-040970-generic 2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1093: Client environment:os.version=#201712161132 SMP Sat Dec 16 16:33:52 UTC 2017 2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1101: Client environment:user.name=(null) 2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1109: Client environment:user.home=/root 2019-06-12 03:13:31,531:10(0x7f2fa8561740):ZOO_INFO@log_env@1121: Client environment:user.dir=/

shanshanpt commented 5 years ago

怎么使用的?正常不会走到seastar::reactor::process_io(...)这个接口

zhanglistar commented 5 years ago

/usr/local/lib/python2.7/dist-packages/xdl-1.0-py2.7.egg/xdl/python/utils/../../bin/ps -smem "20000" -bc "False" -sp "zfs://dn0.jja.bigo:2181,nn0.jja.bigo:2181,nn1.jja.bigo:2181,dn1.jja.bigo:2181,dn2.jja.bigo:2181/psplus/application_1560234332391_0004" -sqps "31250" -snet "125" -r "scheduler" -sn "2" -cp "hdfs://hdfscluster/user/hadoop/deepctr/20190611-154242/output"

这个是启动yarn container的命令,我拿出来单独执行的。

zhanglistar commented 5 years ago

追查了下应该是 /proc/sys/fs/aio-nr 超过了 /proc/sys/fs/aio-max-nr导致。

shanshanpt commented 5 years ago

追查了下应该是 /proc/sys/fs/aio-nr 超过了 /proc/sys/fs/aio-max-nr导致。

用了aio啊,那会走到这里

zhanglistar commented 5 years ago

追查了下应该是 /proc/sys/fs/aio-nr 超过了 /proc/sys/fs/aio-max-nr导致。

用了aio啊,那会走到这里

默认用了aio

zhanglistar commented 5 years ago

追查了下应该是 /proc/sys/fs/aio-nr 超过了 /proc/sys/fs/aio-max-nr导致。

用了aio啊,那会走到这里

但是有个问题是,我跑seastar的所有tests都会core,问题都是空指针