aerospike / aerospike-client-c

Aerospike C Client
Other
98 stars 103 forks source link

use after free in aerospike_query_async with multiple loop threads #99

Closed burlog closed 3 years ago

burlog commented 4 years ago

Prerequisites:

How to reproduce:

┣━ [master] git/github/aerospike-client-c/examples > git log | head
commit 5ab955fc75f2d5fe18fc08380424cd789d7dda74
Author: Brian Nichols <bnichols@aerospike.com>
Date:   Wed Oct 21 13:30:57 2020 -0700

    Update version 5.0.0

commit e0220ef00b2e2864cddfbb97c8f7243c8f8e10eb
Author: Brian Nichols <bnichols@aerospike.com>
Date:   Wed Oct 21 12:18:30 2020 -0700
┣━ [master] git/github/aerospike-client-c/examples > git status                                                                                                 
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   async_examples/async_query/src/main/example.c
        modified:   utils/src/main/example_utils.c
        modified:   ../src/main/aerospike/aerospike_query.c
        modified:   ../src/main/aerospike/as_event.c

no changes added to commit (use "git add" and/or "git commit -a")
┣━ [master] git/github/aerospike-client-c/examples > git diff
diff --git a/examples/async_examples/async_query/src/main/example.c b/examples/async_examples/async_query/src/main/example.c
index da08541e..ff97b9fc 100644
--- a/examples/async_examples/async_query/src/main/example.c
+++ b/examples/async_examples/async_query/src/main/example.c
@@ -191,7 +191,7 @@ run_query(as_event_loop* event_loop)

        // Execute the query.
        as_error err;
-       if (aerospike_query_async(&as, &err, NULL, &query, query_listener, NULL, event_loop) != AEROSPIKE_OK) {
+       if (aerospike_query_async(&as, &err, NULL, &query, query_listener, NULL, NULL) != AEROSPIKE_OK) {
                query_listener(&err, NULL, NULL, event_loop);
        }

diff --git a/examples/utils/src/main/example_utils.c b/examples/utils/src/main/example_utils.c
index e27c3ec8..3099c5bb 100644
--- a/examples/utils/src/main/example_utils.c
+++ b/examples/utils/src/main/example_utils.c
@@ -482,7 +482,7 @@ example_create_event_loop()
        // Initialize logging.
        as_log_set_callback(example_log_callback);

-       if (as_event_create_loops(1)) {
+       if (as_event_create_loops(2)) {
                return true;
        }
 #endif
diff --git a/src/main/aerospike/aerospike_query.c b/src/main/aerospike/aerospike_query.c
index ba2f442f..4d055a07 100644
--- a/src/main/aerospike/aerospike_query.c
+++ b/src/main/aerospike/aerospike_query.c
@@ -1251,6 +1251,9 @@ aerospike_query_async(
                        exec->queued++;
                        as_event_command* cmd = exec->commands[i];
                        as_status status = as_event_command_execute(cmd, err);
+            printf("sleep\n");
+            sleep(3);
+            printf("wake up, now is `exec` destroyed\n");

                        if (status != AEROSPIKE_OK) {
                                as_event_executor_cancel(exec, i);
diff --git a/src/main/aerospike/as_event.c b/src/main/aerospike/as_event.c
index b7aaab19..32770445 100644
--- a/src/main/aerospike/as_event.c
+++ b/src/main/aerospike/as_event.c
@@ -1046,6 +1046,7 @@ as_event_executor_cancel(as_event_executor* executor, uint32_t queued_count)
 void
 as_event_executor_complete(as_event_command* cmd)
 {
+    printf("as_event_executor_complete\n");
        as_event_executor* executor = cmd->udata;
        pthread_mutex_lock(&executor->lock);
        executor->count++;
@@ -1061,6 +1062,7 @@ as_event_executor_complete(as_event_command* cmd)
                if (executor->notify) {
                        executor->complete_fn(executor);
                }
+        printf("as_event_executor_destroy\n");
                as_event_executor_destroy(executor);
        }
        else {
┣━ [master] git/github/aerospike-client-c/examples > valgrind ./async_examples/async_query/target/example 
==14341== Memcheck, a memory error detector
==14341== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==14341== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==14341== Command: ./async_examples/async_query/target/example
==14341== 
host:           127.0.0.1
port:           3000
user:           
namespace:      test
set name:       eg-set
number of keys: 20
Add node BB9CD767085E050 127.0.0.1:3000
inserted 20 keys
executing query: where test-bin = 7
sleep
query callback returned record:
  generation 1, ttl 2592000, 1 bin:
  test-bin : 7
as_event_executor_complete
query is complete
as_event_executor_destroy
wake up, now is `exec` destroyed
==14341== Thread 2:
==14341== Invalid read of size 4
==14341==    at 0x11CEF8: aerospike_query_async (aerospike_query.c:1250)
==14341==    by 0x1134CC: run_query (example.c:194)
==14341==    by 0x12AD4C: as_event_command_parse_header (as_event.c:1227)
==14341==    by 0x12D8A4: as_ev_command_read (as_event_ev.c:578)
==14341==    by 0x12D8A4: as_ev_callback_common (as_event_ev.c:678)
==14341==    by 0x487BB12: ev_invoke_pending (in /usr/lib64/libev.so.4.0.0)
==14341==    by 0x487FCA1: ev_run (in /usr/lib64/libev.so.4.0.0)
==14341==    by 0x12C05A: ev_loop (ev.h:842)
==14341==    by 0x12C05A: as_ev_worker (as_event_ev.c:98)
==14341==    by 0x4C06EF6: start_thread (pthread_create.c:477)
==14341==    by 0x4E89CCE: clone (clone.S:95)
==14341==  Address 0x4f96ce0 is 96 bytes inside a block of size 128 free'd
==14341==    at 0x48389AB: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==14341==    by 0x1299E1: as_event_executor_complete (as_event.c:1066)
==14341==    by 0x12AA67: as_event_query_complete (as_event.c:1099)
==14341==    by 0x11C5D7: as_query_parse_records_async (aerospike_query.c:245)
==14341==    by 0x12D8A4: as_ev_command_read (as_event_ev.c:578)
==14341==    by 0x12D8A4: as_ev_callback_common (as_event_ev.c:678)
==14341==    by 0x487BB12: ev_invoke_pending (in /usr/lib64/libev.so.4.0.0)
==14341==    by 0x487FCA1: ev_run (in /usr/lib64/libev.so.4.0.0)
==14341==    by 0x12C05A: ev_loop (ev.h:842)
==14341==    by 0x12C05A: as_ev_worker (as_event_ev.c:98)
==14341==    by 0x4C06EF6: start_thread (pthread_create.c:477)
==14341==    by 0x4E89CCE: clone (clone.S:95)
==14341==  Block was alloc'd at
==14341==    at 0x483777F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==14341==    by 0x11CBE4: aerospike_query_async (aerospike_query.c:1179)
==14341==    by 0x1134CC: run_query (example.c:194)
==14341==    by 0x12AD4C: as_event_command_parse_header (as_event.c:1227)
==14341==    by 0x12D8A4: as_ev_command_read (as_event_ev.c:578)
==14341==    by 0x12D8A4: as_ev_callback_common (as_event_ev.c:678)
==14341==    by 0x487BB12: ev_invoke_pending (in /usr/lib64/libev.so.4.0.0)
==14341==    by 0x487FCA1: ev_run (in /usr/lib64/libev.so.4.0.0)
==14341==    by 0x12C05A: ev_loop (ev.h:842)
==14341==    by 0x12C05A: as_ev_worker (as_event_ev.c:98)
==14341==    by 0x4C06EF6: start_thread (pthread_create.c:477)
==14341==    by 0x4E89CCE: clone (clone.S:95)
==14341== 
==14341== 
==14341== HEAP SUMMARY:
==14341==     in use at exit: 0 bytes in 0 blocks
==14341==   total heap usage: 174 allocs, 174 frees, 288,359 bytes allocated
==14341== 
==14341== All heap blocks were freed -- no leaks are possible
==14341== 
==14341== For lists of detected and suppressed errors, rerun with: -s
==14341== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
BrianNichols commented 4 years ago

I have duplicated the use-after-free when using the added sleep. The sleep allows the command to complete before making the final for loop condition check:

for (uint32_t i = 0; i < exec->max_concurrent; i++) {

Since the query completed, "exec" is no longer valid. The solution is to use a temporary variable, "max".

uint32_t max = exec->max_concurrent;
for (uint32_t i = 0; i < max; i++) {

There are a few more cases where this construct is used, so we will attempt to fix all of them in the next client release.

burlog commented 4 years ago

Hi Brian,

I have another uncertainty. I wonder if incrementing the queued attribute without any lock is valid.

           for (uint32_t i = 0, m = exec->max_concurrent; i < m; i++) {                                                                                                 
               exec->queued++;                                                                                                                                          
               as_event_command* cmd = exec->commands[i];                                                                                                               
               as_status status = as_event_command_execute(cmd, err);                                                                                                   

               if (status != AEROSPIKE_OK) {                                                                                                                            
                   as_event_executor_cancel(exec, i);                                                                                                                   
                   break;                                                                                                                                               
               }                                                                                                                                                        
           }                                                                                                                                                            
burlog commented 4 years ago

Sorry for closing, I mistakenly hit the Enter.

BrianNichols commented 4 years ago

I think the "queued" increment is okay for queries because "queued" is only used when the query executor is destroyed. If the executor is destroyed, all commands must have finished or be cancelled from same original thread. If there are additional commands to to queue, all commands could not possibly be finished.

There is, however, a slight problem with "queued" for scans. Scan allows a concurrent policy where queries do not. If concurrent is false (not default), scans are executed sequentially and "queued" is incremented in both original thread and eventloop thread. This is unlikely to cause problems because the eventloop thread will read "queued" (value 1) from original thread on first use and then all further increments happen in the eventloop thread. It does look a bit awkward, so we will also investigate a more consistent way to handle "queued" in the next client release.

burlog commented 4 years ago

I see, thanks you for clarify.

BrianNichols commented 4 years ago

C client 5.0.1 is released. The use-after-free condition is fixed, but the "queued" logic remains the same.