X-CASH-official / xcash-dpops

πŸ—³ Delegated-Proof-of-Private-Stake: First DPoS implementation on a Monero-based coin
https://xcash.foundation
MIT License
48 stars 17 forks source link

[bug] (valgrind findings) Uninitialized memory use #72

Open ghost opened 2 years ago

ghost commented 2 years ago

πŸ› BUG REPORT

Ran a quick valgrind on the xcash-dpops testnet branch. It popped up a few uninitialized memory findings I thought I'd share here.

Mostly just issues such as an attempt to delete a poll with this uninitialized epoll_ctl structure here, here, here, here: ==856572== Syscall param epoll_ctl(event) points to uninitialised byte(s) ==856572== at 0x4AD2B2E: epoll_ctl (syscall-template.S:78) ==856572== by 0x26733F: new_socket_thread (server_functions.c:211) ==856572== by 0x26A7A1: socket_receive_data_thread (server_functions.c:923) ==856572== by 0x497B608: start_thread (pthread_create.c:477) ==856572== by 0x4AD2162: clone (clone.S:95)

And this socket close call on an unitialized structure from here, here: ==856572== Syscall param close(fd) contains uninitialised byte(s) ==856572== at 0x49863FB: close (close.c:27) ==856572== by 0x25E9F4: get_delegates_online_status (block_verifiers_update_functions.c:1300) ==856572== by 0x224C2E: start_new_round (block_verifiers_functions.c:115) ==856572== by 0x24791E: current_block_height_timer_thread (block_verifiers_thread_server_functions.c:186) ==856572== by 0x497B608: start_thread (pthread_create.c:477) ==856572== by 0x4AD2162: clone (clone.S:95)

And a couple missed stack buffer initializations here (missed the 'data' initialization) and here (missed the 'data3' initialization): ==856572== Conditional jump or move depends on uninitialised value(s) ==856572== at 0x25E555: get_delegates_online_status (block_verifiers_update_functions.c:1238) ==856572== by 0x224C2E: start_new_round (block_verifiers_functions.c:115) ==856572== by 0x24791E: current_block_height_timer_thread (block_verifiers_thread_server_functions.c:186) ==856572== by 0x497B608: start_thread (pthread_create.c:477) ==856572== by 0x4AD2162: clone (clone.S:95) And here and here: ==900800== Conditional jump or move depends on uninitialised value(s) ==900800== at 0x483EFA9: strlen_sse2 (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==900800== by 0x1E6739: parse_json_data (string_functions.c:57) ==900800== by 0x22418E: server_receive_data_socket_node_to_node (block_verifiers_server_functions.c:415) ==900800== by 0x269E2E: socket_thread (server_functions.c:856) ==900800== by 0x26A923: socket_receive_data_thread (server_functions.c:932) ==900800== by 0x497B608: start_thread (pthread_create.c:477) ==900800== by 0x4AD2162: clone (clone.S:95) And here: ==901150== Conditional jump or move depends on uninitialised value(s) ==901150== at 0x483EFA9: strlen_sse2 (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==901150== by 0x1EA987: get_delegates_total_voters_count (shared_delegate_website_thread_server_functions.c:241) ==901150== by 0x26B5A6: block_verifiers_add_reserve_proof_check_if_data_is_valid (delegate_server_functions.c:146) ==901150== by 0x26C008: server_receive_data_socket_node_to_block_verifiers_add_reserve_proof (delegate_server_functions.c:353) ==901150== by 0x269A32: socket_thread (server_functions.c:753) ==901150== by 0x26A93C: socket_receive_data_thread (server_functions.c:932) ==901150== by 0x497B608: start_thread (pthread_create.c:477) ==901150== by 0x4AD2162: clone (clone.S:95)

Also valgrind doesn't seem to like the individual structure member approach taken here:

  // initialize the reserve_proof struct
  memset(reserve_proof.block_verifier_public_address,0,sizeof(reserve_proof.block_verifier_public_address));
  memset(reserve_proof.public_address_created_reserve_proof,0,sizeof(reserve_proof.public_address_created_reserve_proof));
  memset(reserve_proof.public_address_voted_for,0,sizeof(reserve_proof.public_address_voted_for));
  memset(reserve_proof.reserve_proof_amount,0,sizeof(reserve_proof.reserve_proof_amount));
  memset(reserve_proof.reserve_proof,0,sizeof(reserve_proof.reserve_proof));

I would probably just clear it in one fell swoop:

  memset(&reserve_proof,0,sizeof(reserve_proof));

Expected Behavior Memory should be initialized before referenced.

Current Behavior Some missed initializations could potentially result in wrong file handles being closed. Also that one stack buffer could wind up with random data appended to the string. Maybe...

Possible Solution valgrind seemed happy with a few simple memsets. i.e.:

memset(events,0,sizeof(events)); memset(block_verifiers_send_data_socket,0,sizeof(block_verifiers_send_data_socket)); memset(data,0,sizeof(data)); etc...

I also recommend wrapping the EPOLL_CTL_DEL and the close() calls in a check for a valid socket as such:

   for (count = 0; count < TOTAL_BLOCK_VERIFIERS; count++)
   {
-    epoll_ctl(epoll_fd_copy, EPOLL_CTL_DEL, block_verifiers_send_data_socket[count].socket, &events[count]);
-    close(block_verifiers_send_data_socket[count].socket);
+    if (block_verifiers_send_data_socket[count].socket > 0)
+    {
+      epoll_ctl(epoll_fd_copy, EPOLL_CTL_DEL, block_verifiers_send_data_socket[count].socket, &events[count]);
+      close(block_verifiers_send_data_socket[count].socket);
+    }
   }

Steps to Reproduce (for bugs) run xcash-dpops under valgrind. I used this in my testnet machine /usr/lib/systemd/system/xcash-dpops.service:

ExecStart=/usr/bin/valgrind --max-stackframe=2100800 --num-callers=20 --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose --log-file=/tmp/valgrind-dpops-out-%%p-%%n.val /home/xcash_user/xcash-official/xcash-dpops/build/xcash-dpops --block-verifiers-secret-key ***secret***  --shared-delegates-website --fee 12.345678 --minimum-amount 10000 --start-time 122 0 8 16 56

Context

Your Environment

zachhildreth commented 2 years ago

Thanks a lot for this @CygnusMiner I will get to looking at it, and applying fixes or asking questions

do you mind running one on the remote-data branch as well and posting any new findings

this way we can leave this issue open while we switch on the testnet to rd and close it when done

Thanks!

ghost commented 2 years ago

Sure. I'm attaching a full valgrind file (prior to any memory leak logging since this is an uninitialized memory defect). It does the initialization checking prior to gathering memory leaks (which really clutter up the file) so I can catch it here.

The initialization warnings start at line 206: (edit: .zip files are scary when downloading. Re-uploading it as normal .txt file instead) valgrind-remote-data.txt

This is a valgrind run on a debug build of a clone of xcash-dpops with a git checkout remote-data using the following flags:

/usr/bin/valgrind --max-stackframe=2100800 --num-callers=20 --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose

zachhildreth commented 2 years ago

I looked over and fixed all but the epoll struct initializes since it does not say that in the documentation is necessary.

I will look at the attached file next

still need to test on the internal testnet

ghost commented 2 years ago

Thanks Zach! The latest clone of the remote-data branch looks good. The only outstanding uninitialized errors are the Syscall param epoll_ctl(event) points to uninitialised byte(s) and Googling that seems to show a consensus that it's a valgrind false positive. I would say this issue has been addressed.

zachhildreth commented 2 years ago

oh does the txt file not show any more?

ghost commented 2 years ago

Not that I can tell. It seemed to show the same findings as what I got from the dpops-test branch which is what I created the first defect against. I believe you've addressed all of the outstanding uninitialized findings thus far.

I can't guarantee that all have been hit, though. It seems valgrind is slowing my dpops down enough that it cant make it past a simple reserve bytes check...

Checking reserve bytes
The database is not synced with the majority of block verifiers, syncing the database from a random block verifier that is in the majority
Syncing the reserve bytes database
Getting the database data from dpops-test-4.xcash.foundation
Could not receive data from dpops-test-4.xcash.foundation
Connecting to another block verifier
Getting the database data from franckiki2.ddns.net
Could not receive data from franckiki2.ddns.net
Connecting to another block verifier
Getting the database data from 5.255.102.225
Could not receive data from 5.255.102.225
Connecting to another block verifier
Getting the database data from testnode-nl.xcash.rocks
Could not receive data from testnode-nl.xcash.rocks
Connecting to another block verifier
Getting the database data from umxnet.ursamajor.eu
Could not receive data from umxnet.ursamajor.eu
Connecting to another block verifier
Getting the database data from 185.215.167.107
Could not receive data from 185.215.167.107
Connecting to another block verifier
etc...

I'm going to have to spend time trying to reduce the overhead. Maybe I'll turn memleak checking off completely...