cloudlinux / libcare

libcare -- Patch Userspace Code in Live Processes
GNU General Public License v2.0
145 stars 57 forks source link

In multithreaded programs,some threads "Verifying safety for pid xxxxx" FAILED ! #46

Open chenzhbao629 opened 4 years ago

chenzhbao629 commented 4 years ago

Verifying safety for pid 88521... Stacktrace to verify safety for pid 88521: [0x7fa5c32aea3d] __poll_nocancel+0x24 [0x7fa5c9d3a11f] fdset_event_dispatch+0x6f [0x7fa5c9d3b270] rte_vhost_driver_session_start+0x10 [0x559bb0d2860b] _init+0x176513 kpatch_patch.c(201): safety check failed for 7fa5c9d39fb0

kpatch_patch.c(497): Patching xxxx.so failed, unapplying partially applied patch

Finished ptrace detaching.Failed to apply patch './libshared.kpatch' kpatch_patch.c(588): Failed to apply patch './libshared.kpatch'

paboldin commented 4 years ago

Did libcare write you the function for which the safety verification failed? If not, this alone is certainly a bug.

For some applications there is always a loop function that loops forever until application exits. It is always on the stack thus, and can't be properly patched.

Whether it is your case would be a lot easier to say by looking at the patch. If you can share the patch please do it here.

At least, please provide the full log.

chenzhbao629 commented 4 years ago

Thanks paboldin, just like what you say:For some applications there is always a loop function that loops forever until application exits. It is always on the stack thus, and can't be properly patched.

there is a thread always on the loop so we tried to stop the thread, using: kill -STOP pid, then we tyied to patch again, but failed ,

attached to 15 thread(s): 74232, 74233, 74234, 74235, 74236, 74237, 74238, 74239, 74253, 74257, 74258, 74259, 74260, 74261, 74262 Loading patch info 'librte_vhost.so.3'...nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... nononoeeeeeeeeee------- 'librte_vhost.so.3'... sddddddheeeeeeeeee------- 'librte_vhost.so.3'...successfully, 4 entries kpatch_patch.c(361): Patching 361, ------------------------------------ Counting undefined symbols: Undefined symbol 'rte_mem_virt2phy' Undefined symbol 'get_device' Undefined symbol 'reset_device' Undefined symbol 'rte_malloc' Undefined symbol 'assert_fail@@GLIBC_2.2.5' Undefined symbol 'cleanup_device' Undefined symbol 'rte_zmalloc' Undefined symbol 'close@@GLIBC_2.2.5' Undefined symbol 'rte_mempool_ops_table' Undefined symbol 'read@@GLIBC_2.2.5' Undefined symbol 'rte_malloc_socket' Undefined symbol '__tls_get_addr@@GLIBC_2.3' Undefined symbol 'fxstat64@@GLIBC_2.2.5' Undefined symbol 'alloc_vring_queue_pair' Undefined symbol 'memcpy@@GLIBC_2.14' Undefined symbol 'read_fd_message' Undefined symbol 'rte_free' Undefined symbol 'rte_log' Undefined symbol 'eventfd_write@@GLIBC_2.7' Undefined symbol 'VHOST_FEATURES' Undefined symbol 'per_lcorelcore_id' Undefined symbol 'mmap64@@GLIBC_2.2.5' Undefined symbol 'vhost_devices' Undefined symbol 'malloc@@GLIBC_2.2.5' Undefined symbol 'get_mempolicy' Undefined symbol 'munmap@@GLIBC_2.2.5' Undefined symbol 'madvise@@GLIBC_2.2.5' Undefined symbol 'memmove@@GLIBC_2.2.5' Jump table 464 bytes for 28 syms at offset 0x11730 Looking for patch region for 'librte_vhost.so.3'... Found patch region for 'librte_vhost.so.3' at 7f1d71559000 mmap_remote: 0x7f1d71559000+12000, 7, 32, -1, 0 Executing syscall 9 (pid 74232)... wait_for_stop(pctx->pid=74232, pid=74232) allocated 0x12000 bytes at 0x9 for 'librte_vhost.so.3' patch Marking this space as busy kpatch_patch.c(389): Patching 389, ------------------------------------ Resolving sections' addresses for 'librte_vhost.so.3' section '.note.gnu.build-id' = 0x7f1d705ad1c8 section '.gnu.hash' = 0x7f1d705ad1f0 section '.dynsym' = 0x7f1d705ad2a0 section '.dynstr' = 0x7f1d705ad930 section '.gnu.version' = 0x7f1d705adde4 section '.gnu.version_r' = 0x7f1d705adb38 section '.rela.dyn' = 0x7f1d705adfd0 section '.rela.plt' = 0x7f1d705ae2d0 section '.init' = 0x7f1d705ae708 section '.rela.init' = 0x0 section '.plt' = 0x7f1d705ae730 section '.text' = 0x7f1d705aea10 section '.rela.text' = 0x0 section '.kpatch.text' = 0x449 section '.rela.kpatch.text' = 0xf6a9 section '.fini' = 0x7f1d705bdfe0 section '.rodata' = 0x7f1d705bdff0 section '.rela.rodata' = 0x0 section '.kpatch.strtab' = 0xd7e6 section '.kpatch.info' = 0xd864 section '.rela.kpatch.info' = 0x10ae9 section '.eh_frame_hdr' = 0x7f1d705bf3dc section '.eh_frame' = 0x7f1d705bf570 section '.rela.eh_frame' = 0x0 section '.init_array' = 0x7f1d707c0c60 section '.rela.init_array' = 0x0 section '.fini_array' = 0x7f1d707c0c68 section '.rela.fini_array' = 0x0 section '.jcr' = 0x7f1d707c0c70 section '.data.rel.ro' = 0x7f1d707c0c80 section '.rela.data.rel.ro' = 0x0 section '.dynamic' = 0x7f1d707c0d48 section '.got' = 0x7f1d707c0fb8 section '.got.plt' = 0x7f1d707c1000 section '.data' = 0x7f1d707c1180 section '.tm_clone_table' = 0x7f1d707cb108 section '.kpatch.data' = 0xd941 section '.rela.kpatch.data' = 0x10c09 section '.bss' = 0x7f1d707cf200 section '.comment' = 0x0 section '.shstrtab' = 0xdba7 section '.symtab' = 0xe811 section '.strtab' = 0xf2f1 Resolving symbols for 'librte_vhost.so.3' symbol 'rte_vhost_update_totalpkts.kpatch' is defined and global, we don't check for overrition symbol 'rte_vhost_update_totalpkts.kpatch' = 0x449 symbol 'rte_vhost_dequeue_burst.kpatch' is defined and global, we don't check for overrition symbol 'rte_vhost_dequeue_burst.kpatch' = 0x95c9 symbol 'rte_vhost_dequeue_burst' is defined and global, we don't check for overrition symbol 'rte_vhost_dequeue_burst' = 0x7f1d705b82c0 symbol 'assert_fail' = 0x7f1d69a642d0 jmptable 'assert_fail' = 0x11749 symbol 'close' = 0x7f1d69b1fe80 jmptable 'close' = 0x11759 symbol 'rte_vhost_update_totalpkts' is defined and global, we don't check for overrition symbol 'rte_vhost_update_totalpkts' = 0x7f1d705af100 symbol 'vhost_user_msg_handler.kpatch' is defined and global, we don't check for overrition symbol 'vhost_user_msg_handler.kpatch' = 0xc469 symbol 'read' = 0x7f1d69b1f7d0 jmptable 'read' = 0x11769 symbol 'tls_get_addr' = 0x7f1d714a2400 jmptable 'tls_get_addr' = 0x11779 symbol 'rte_vhost_enqueue_burst.kpatch' is defined and global, we don't check for overrition symbol 'rte_vhost_enqueue_burst.kpatch' = 0x4b9 symbol 'fxstat64' = 0x7f1d69b1f140 jmptable 'fxstat64' = 0x11789 symbol 'vhost_user_msg_handler' is defined and global, we don't check for overrition symbol 'vhost_user_msg_handler' = 0x7f1d705bb270 Executing callrax 7f1d69ac4920 (pid 74232) wait_for_stop(pctx->pid=74232, pid=74232) symbol 'memcpy' = 0x7f1d69b80cd0 jmptable 'memcpy' = 0x11799 symbol 'notify_ops' is defined and global, we don't check for overrition symbol 'notify_ops' = 0x7f1d707cf208 symbol 'eventfd_write' = 0x7f1d69b2e640 jmptable 'eventfd_write' = 0x117a9 symbol 'mmap64' = 0x7f1d69b28960 jmptable 'mmap64' = 0x117b9 symbol 'malloc' = 0x7f1d69ab60c0 jmptable 'malloc' = 0x117c9 symbol 'munmap' = 0x7f1d69b28a20 jmptable 'munmap' = 0x117d9 symbol 'madvise' = 0x7f1d69b28ae0 jmptable 'madvise' = 0x117e9 Executing callrax 7f1d69abf710 (pid 74232) wait_for_stop(pctx->pid=74232, pid=74232) symbol 'memmove' = 0x7f1d69b86270 jmptable 'memmove' = 0x117f9 symbol 'rte_vhost_enqueue_burst' is defined and global, we don't check for overrition symbol 'rte_vhost_enqueue_burst' = 0x7f1d705af170 kpatch_patch.c(393): Patching 393, ------------------------------------ Applying relocations for 'librte_vhost.so.3'... applying relocations to '.kpatch.text' applying relocations to '.kpatch.info' applying relocations to '.kpatch.data' kpatch_patch.c(397): Patching 397, ------------------------------------ kpatch_patch.c(400): Patching 400, ------------------------------------ kpatch_patch.c(497): Patching librte_vhost.so.3 failed, unapplying partially applied patch Verifying safety for pid 74232... Stacktrace to verify safety for pid 74232: [0x7f1d69b23a3d] poll_nocancel+0x24 [0x55cff5add826] _init+0x14672e [0x55cff5ac1fda] _init+0x12aee2 [0x55cff599b5d9] _init+0x44e1 [0x7f1d69a57c05] libc_start_main+0xf5 [0x55cff599c32d] _init+0x5235 [0x0] OK Verifying safety for pid 74233... Stacktrace to verify safety for pid 74233: [0x7f1d6a55298d] accept_nocancel+0x24 [0x7f1d6ea3b618] rte_thread_setname+0xde8 [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74234... Stacktrace to verify safety for pid 74234: [0x7f1d69b2e923] epoll_wait_nocancel+0x2a [0x7f1d6ea3ea14] rte_exit+0x5e4 [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74235... Stacktrace to verify safety for pid 74235: [0x7f1d6a552b03] recvfrom_nocancel+0x2a [0x7f1d707d33fa] [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74236... Stacktrace to verify safety for pid 74236: [0x7f1d69af51ad] nanosleep_nocancel+0x24 [0x7f1d69af5044] sleep+0xd4 [0x55cff5ae934f] _init+0x152257 [0x55cff5b11316] _init+0x17a21e [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74237... Stacktrace to verify safety for pid 74237: [0x7f1d69b23a3d] poll_nocancel+0x24 [0x7f1d705af11f] fdset_event_dispatch+0x6f [0x7f1d705b0270] rte_vhost_driver_session_start+0x10 [0x55cff5b0d60b] _init+0x176513 [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74238... Stacktrace to verify safety for pid 74238: [0x7f1d69b23a3d] poll_nocancel+0x24 [0x55cff5add826] _init+0x14672e [0x55cff5ac1fda] _init+0x12aee2 [0x55cff5aa4eeb] _init+0x10ddf3 [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74239... Stacktrace to verify safety for pid 74239: [0x7f1d69b23a3d] poll_nocancel+0x24 [0x55cff5add826] _init+0x14672e [0x55cff5ac1fda] _init+0x12aee2 [0x55cff5b343e4] _init+0x19d2ec [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74253... Stacktrace to verify safety for pid 74253: [0x7f1d69af51ad] nanosleep_nocancel+0x24 [0x7f1d69af5044] sleep+0xd4 [0x7f1d705afa0e] vhost_user_client_reconnect+0x16e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74257... Stacktrace to verify safety for pid 74257: [0x7f1d69b23a3d] poll_nocancel+0x24 [0x55cff5add826] _init+0x14672e [0x55cff5ac1fda] _init+0x12aee2 [0x55cff59d84e1] _init+0x413e9 [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74258... Stacktrace to verify safety for pid 74258: [0x7f1d69b23a3d] __poll_nocancel+0x24 [0x55cff5add826] _init+0x14672e [0x55cff5ac1fda] _init+0x12aee2 [0x55cff59d8f90] _init+0x41e98 [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74259... Stacktrace to verify safety for pid 74259: [0x55cff5a0f54b] _init+0x78453 [0x55cff5a0f8da] _init+0x787e2 [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74260... Stacktrace to verify safety for pid 74260: [0x7f1d70e0e5f3] [0x55cff5b11cc9] _init+0x17abd1 [0x55cff5a3eae1] _init+0xa79e9 [0x55cff5a0f566] _init+0x7846e [0x55cff5a0f8da] _init+0x787e2 [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74261... Stacktrace to verify safety for pid 74261: [0x55cff5a0f578] _init+0x78480 [0x55cff5a0f8da] _init+0x787e2 [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK Verifying safety for pid 74262... Stacktrace to verify safety for pid 74262: [0x55cff5a0f975] _init+0x7887d [0x55cff5aa6786] _init+0x10f68e [0x7f1d6a54be25] start_thread+0xc5 [0x7f1d69b2e34d] clone+0x6d [0x0] OK kpatch_patch.c(246): Patching 246, -----------safety safety safety------------------------ munmap_remote: 0x9+11730 Executing syscall 11 (pid 74232)... wait_for_stop(pctx->pid=74232, pid=74232) kpatch_patch.c(504): Can't unapply patch for librte_vhost.so.3 Detaching from 74232...OK Detaching from 74233...OK Detaching from 74234...OK Detaching from 74235...OK Detaching from 74236...OK Detaching from 74237...OK Detaching from 74238...OK Detaching from 74239...OK Detaching from 74253...OK Detaching from 74257...OK Detaching from 74258...OK Detaching from 74259...OK Detaching from 74260...OK Detaching from 74261...OK Detaching from 74262...OK Finished ptrace detaching.Failed to apply patch './libshared.kpatch' kpatch_patch.c(588): Failed to apply patch './libshared.kpatch'

paboldin commented 4 years ago

It is impossible to patch the function that executes main loop because it is always on the stack and there is (almost) no way to patch this correctly.

We apply a patch to function by re-writing its first instructions with a jmp to the patched version of the function. If the function never exits it is pointless to do so, because it will never leave the code of loop and execute the patched version. Sending it a SIGSTOP won't help as it stops the application inside the event loop.

The easiest solution here is to patch one of the functions the event loop calls, if it is possible, and remove patch from the loop function, so there is no conflicts.

If you can -- simplify your patch to the point where it is only a single 'printf' statement in each block and provide it here, through git repo or privately via E-mail.

chenzhbao629 commented 4 years ago

Hi paboldin, thank you for your reply, but i have a question that why i need to "Verifying safety for pid xxxx... " , even the thread doesn't call the function which should be patched ?

paboldin commented 4 years ago

I can't be sure that no thread calls the target function. From what I see the original comment the code thinks there is a target function on the stack.

If you can, please provide at least the following outputs, here or privately: $ strings PATCHFILE $ diff -u lib/librte_vhost/.kpatch_fd_manoriginal.s lib/librte_vhost/.kpatch_fd_manpatched.s

It will help me to see if there the patch really contains the target function.

Preferable, just show me the patch.

chenzhbao629 commented 4 years ago

Hi Boldin, l'm so sorry for not provide enough information, thanks for you patience, the following files:one is my original codes, another is my patch.

and i made a mistake for what i said, i anslysis my code, find the thread really call the function which need patched, by call one function,and the function call the patched function.

a little while ago, i make a try, in the loop, i added a code 'sleep(100)', and patched success, it seems that the loop is too busy to exit, so do you have any idea about this situation ?

Pavel Boldin notifications@github.com 于2019年10月24日周四 下午9:04写道:

I can't be sure that no thread calls the target function. From what I see the original comment the code thinks there is a target function on the stack.

If you can, please provide at least the following outputs, here or privately: $ strings PATCHFILE $ diff -u lib/librte_vhost/.kpatch_fd_manoriginal.s lib/librte_vhost/.kpatch_fd_manpatched.s

It will help me to see if there the patch really contains the target function.

Preferable, just show me the patch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudlinux/libcare/issues/46?email_source=notifications&email_token=ANP7EIDKSN64HSUKKZHOLGDQQGMNFA5CNFSM4JBJLNBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECE6JGA#issuecomment-545907864, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANP7EIFN5EUZIXON5ZJY7Q3QQGMNFANCNFSM4JBJLNBA .

paboldin commented 4 years ago

The files you've sent are not being displayed at github, please send them to me via boldin.pavel@gmail.com.

The problem here may be with the inlined functions. When code declares function as a static it becomes local to the file and the compiler can just insert the code in place of every call, to save function call overhead. This is usually the case when there is only one call for the function in the whole file.

This might be your case as well. But, again, I will need both the patch and the binary patch to be sure.