Closed 0xFEEDC0DE64 closed 2 years ago
I don't build with -DWITH_RADOSGW_SELECT_PARQUET=ON
, but I do have a feature branch open for v17.2.5
which is currently green for builds.
I'm actively testing it in a test env at the moment, so you might want to take a look at what has been done there. You can take it and experiment with setting the select define if you like.
See: feature/v17.2.4
I also had an compiler error with your feature branch
[ 2%] Generating ceph-exporter_options.cc, ../../../include/ceph-exporter_legacy_options.h
cd /home/feedc0de/aur-ceph/src/ceph-17.2.5/build/src/common/options && /usr/bin/python3.10 /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/options/y2c.py --input /home/feedc0de/aur-ceph/src/ceph-17.2.5/build/src/common/options/ceph-exporter.yaml --output ceph-exporter_options.cc -
-legacy /home/feedc0de/aur-ceph/src/ceph-17.2.5/build/include/ceph-exporter_legacy_options.h --name ceph-exporter
In file included from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/config_values.h:59,
from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/config.h:28,
from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/config_proxy.h:6,
from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/ceph_context.h:41,
from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/librados/snap_set_diff.cc:7:
/home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/options/legacy_config_opts.h:1:10: fatal error: global_legacy_options.h: No such file or directory
1 | #include "global_legacy_options.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [src/CMakeFiles/rados_snap_set_diff_obj.dir/build.make:79: src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o] Error 1
make[2]: Leaving directory '/home/feedc0de/aur-ceph/src/ceph-17.2.5/build'
make[1]: *** [CMakeFiles/Makefile2:3657: src/CMakeFiles/rados_snap_set_diff_obj.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
on a second try to build it worked, I have some aarch64 packages ready to test :)
OSD crashes
~ sudo /usr/bin/ceph-osd -f --cluster ceph --id 12 --setuser ceph --setgroup ceph
2022-10-28T08:20:32.147+0000 ffff88601040 -1 Falling back to public interface
2022-10-28T08:23:49.958+0000 ffff88601040 -1 osd.12 27983 log_to_monitors true
2022-10-28T08:23:49.982+0000 ffff88601040 -1 osd.12 27983 mon_cmd_maybe_osd_create fail: 'osd.12 has already bound to class 'hdd', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy
2022-10-28T08:23:53.642+0000 ffff88601040 -1 bdev(0xaaaabc6ba400 /var/lib/ceph/osd/ceph-12/block) aio_submit retries 1
2022-10-28T08:24:30.476+0000 ffff7b8fc600 -1 osd.12 28243 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
/home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: In function 'void OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)' thread ffff6765c600 time 2022-10-28T08:24:36.200440+0000
/home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: 9676: FAILED ceph_assert(started <= reserved_pushes)
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x134) [0xaaaab9e8bf28]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
3: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
8: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
9: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
*** Caught signal (Aborted) **
in thread ffff6765c600 thread_name:tp_osd_tp
2022-10-28T08:24:36.204+0000 ffff6765c600 -1 /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: In function 'void OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)' thread ffff6765c600 time 2022-10-28T08:24:36.200440+0000
/home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: 9676: FAILED ceph_assert(started <= reserved_pushes)
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x134) [0xaaaab9e8bf28]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
3: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
8: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
9: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: __kernel_rt_sigreturn()
2: /usr/lib/libc.so.6(+0x82790) [0xffff87942790]
3: raise()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0xaaaab9e8bf7c]
6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
7: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
12: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
13: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
2022-10-28T08:24:36.208+0000 ffff6765c600 -1 *** Caught signal (Aborted) **
in thread ffff6765c600 thread_name:tp_osd_tp
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: __kernel_rt_sigreturn()
2: /usr/lib/libc.so.6(+0x82790) [0xffff87942790]
3: raise()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0xaaaab9e8bf7c]
6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
7: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
12: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
13: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-7284> 2022-10-28T08:24:30.476+0000 ffff7b8fc600 -1 osd.12 28243 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
-10> 2022-10-28T08:24:36.204+0000 ffff6765c600 -1 /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: In function 'void OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)' thread ffff6765c600 time 2022-10-28T08:24:36.200440+0000
/home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: 9676: FAILED ceph_assert(started <= reserved_pushes)
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x134) [0xaaaab9e8bf28]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
3: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
8: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
9: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
0> 2022-10-28T08:24:36.208+0000 ffff6765c600 -1 *** Caught signal (Aborted) **
in thread ffff6765c600 thread_name:tp_osd_tp
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: __kernel_rt_sigreturn()
2: /usr/lib/libc.so.6(+0x82790) [0xffff87942790]
3: raise()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0xaaaab9e8bf7c]
6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
7: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
12: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
13: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-9999> 2022-10-28T08:24:30.476+0000 ffff7b8fc600 -1 osd.12 28243 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
-9998> 2022-10-28T08:24:36.204+0000 ffff6765c600 -1 /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: In function 'void OSD::do_recovery(PG*, epoch_t, uint64_t, ThreadPool::TPHandle&)' thread ffff6765c600 time 2022-10-28T08:24:36.200440+0000
/home/feedc0de/aur-ceph/src/ceph-17.2.5/src/osd/OSD.cc: 9676: FAILED ceph_assert(started <= reserved_pushes)
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x134) [0xaaaab9e8bf28]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
3: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
7: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
8: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
9: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
-9997> 2022-10-28T08:24:36.208+0000 ffff6765c600 -1 *** Caught signal (Aborted) **
in thread ffff6765c600 thread_name:tp_osd_tp
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: __kernel_rt_sigreturn()
2: /usr/lib/libc.so.6(+0x82790) [0xffff87942790]
3: raise()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0xaaaab9e8bf7c]
6: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaab9e8c09c]
7: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x4ec) [0xaaaab9f3ed10]
8: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x28) [0xaaaaba1b4c08]
9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x46c) [0xaaaab9f3f25c]
10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x308) [0xaaaaba591af8]
11: (ShardedThreadPool::WorkThreadSharded::entry()+0x18) [0xaaaaba594368]
12: /usr/lib/libc.so.6(+0x80aec) [0xffff87940aec]
13: /usr/lib/libc.so.6(+0xea5dc) [0xffff879aa5dc]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[1] 23143 IOT instruction sudo /usr/bin/ceph-osd -f --cluster ceph --id 12 --setuser ceph --setgroup
That's surprising, I have two test clusters I've been doing perf benchmarking using the most recent iteration of this branch.
Are you using the most recent HEAD? I tend to do a lot of rebase/force-push-ing on my feature branches.
I'll rebuild the packages today in a clean chroot just to make sure
I also had an compiler error with your feature branch
[ 2%] Generating ceph-exporter_options.cc, ../../../include/ceph-exporter_legacy_options.h cd /home/feedc0de/aur-ceph/src/ceph-17.2.5/build/src/common/options && /usr/bin/python3.10 /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/options/y2c.py --input /home/feedc0de/aur-ceph/src/ceph-17.2.5/build/src/common/options/ceph-exporter.yaml --output ceph-exporter_options.cc - -legacy /home/feedc0de/aur-ceph/src/ceph-17.2.5/build/include/ceph-exporter_legacy_options.h --name ceph-exporter In file included from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/config_values.h:59, from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/config.h:28, from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/config_proxy.h:6, from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/ceph_context.h:41, from /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/librados/snap_set_diff.cc:7: /home/feedc0de/aur-ceph/src/ceph-17.2.5/src/common/options/legacy_config_opts.h:1:10: fatal error: global_legacy_options.h: No such file or directory 1 | #include "global_legacy_options.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[2]: *** [src/CMakeFiles/rados_snap_set_diff_obj.dir/build.make:79: src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o] Error 1 make[2]: Leaving directory '/home/feedc0de/aur-ceph/src/ceph-17.2.5/build' make[1]: *** [CMakeFiles/Makefile2:3657: src/CMakeFiles/rados_snap_set_diff_obj.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs....
You probably encountered another part of the racy legacy header generation that the upstream does since quincy. See https://github.com/bazaah/aur-ceph/blob/feature/v17.2.4/ceph-17.2.4-compressor-common-depends.patch for another example. Fortunately, the fix is pretty simple.
I'll probably add a patch for that in a pkgrel=2
version.
rebuilt + tested, still no issues. Unless you can provide me with a reproducible test failure, I'm going to chalk this up to environment.
well, I nuked the failing OSD and recreated it on the same disk, the rest seems to be working fine, thank you so much for providing your work here.
My cluster only consists of aarch64 machines (odroids, raspberries, and others), and the produced binaries work across all of them.
Just out of curiosity, how much hassle could we have saved by using the containerized approach for ceph?
Just out of curiosity, how much hassle could we have saved by using the containerized approach for ceph?
cephadm
(the binary) is a big ol' ball of python, so not much. I've experimented with it, and straight up refuse to have
between me and the storage layer for my whole org. Just cranky sysadmin behavior, I guess.
Besides, for my specific usecase having an actual librbd.so
is important because of how we use qemu.
I'm closing this as I finally got a clean make check
run:
Hi, my cluster is already running ceph 17.2.4 and since some arch upgrades, some dependency libraries were updated too and I cannot rebuild ceph anymore. I know this repo only builds ceph 16, but maybe you have an idea why it doesnt build with the newer version anymore?
would a c-style cast do the trick here?