Open ghost opened 2 years ago
Thanks a lot for this @CygnusMiner I will get to looking at it, and applying fixes or asking questions
do you mind running one on the remote-data
branch as well and posting any new findings
this way we can leave this issue open while we switch on the testnet to rd and close it when done
Thanks!
Sure. I'm attaching a full valgrind file (prior to any memory leak logging since this is an uninitialized memory defect). It does the initialization checking prior to gathering memory leaks (which really clutter up the file) so I can catch it here.
The initialization warnings start at line 206: (edit: .zip files are scary when downloading. Re-uploading it as normal .txt file instead) valgrind-remote-data.txt
This is a valgrind run on a debug build of a clone of xcash-dpops with a git checkout remote-data
using the following flags:
/usr/bin/valgrind --max-stackframe=2100800 --num-callers=20 --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose
I looked over and fixed all but the epoll struct initializes since it does not say that in the documentation is necessary.
I will look at the attached file next
still need to test on the internal testnet
Thanks Zach! The latest clone of the remote-data
branch looks good. The only outstanding uninitialized errors are the Syscall param epoll_ctl(event) points to uninitialised byte(s)
and Googling that seems to show a consensus that it's a valgrind false positive. I would say this issue has been addressed.
oh does the txt file not show any more?
Not that I can tell. It seemed to show the same findings as what I got from the dpops-test
branch which is what I created the first defect against. I believe you've addressed all of the outstanding uninitialized findings thus far.
I can't guarantee that all have been hit, though. It seems valgrind is slowing my dpops down enough that it cant make it past a simple reserve bytes check...
Checking reserve bytes
The database is not synced with the majority of block verifiers, syncing the database from a random block verifier that is in the majority
Syncing the reserve bytes database
Getting the database data from dpops-test-4.xcash.foundation
Could not receive data from dpops-test-4.xcash.foundation
Connecting to another block verifier
Getting the database data from franckiki2.ddns.net
Could not receive data from franckiki2.ddns.net
Connecting to another block verifier
Getting the database data from 5.255.102.225
Could not receive data from 5.255.102.225
Connecting to another block verifier
Getting the database data from testnode-nl.xcash.rocks
Could not receive data from testnode-nl.xcash.rocks
Connecting to another block verifier
Getting the database data from umxnet.ursamajor.eu
Could not receive data from umxnet.ursamajor.eu
Connecting to another block verifier
Getting the database data from 185.215.167.107
Could not receive data from 185.215.167.107
Connecting to another block verifier
etc...
I'm going to have to spend time trying to reduce the overhead. Maybe I'll turn memleak checking off completely...
π BUG REPORT
Ran a quick valgrind on the xcash-dpops testnet branch. It popped up a few uninitialized memory findings I thought I'd share here.
Mostly just issues such as an attempt to delete a poll with this uninitialized epoll_ctl structure here, here, here, here: ==856572== Syscall param epoll_ctl(event) points to uninitialised byte(s) ==856572== at 0x4AD2B2E: epoll_ctl (syscall-template.S:78) ==856572== by 0x26733F: new_socket_thread (server_functions.c:211) ==856572== by 0x26A7A1: socket_receive_data_thread (server_functions.c:923) ==856572== by 0x497B608: start_thread (pthread_create.c:477) ==856572== by 0x4AD2162: clone (clone.S:95)
And this socket close call on an unitialized structure from here, here: ==856572== Syscall param close(fd) contains uninitialised byte(s) ==856572== at 0x49863FB: close (close.c:27) ==856572== by 0x25E9F4: get_delegates_online_status (block_verifiers_update_functions.c:1300) ==856572== by 0x224C2E: start_new_round (block_verifiers_functions.c:115) ==856572== by 0x24791E: current_block_height_timer_thread (block_verifiers_thread_server_functions.c:186) ==856572== by 0x497B608: start_thread (pthread_create.c:477) ==856572== by 0x4AD2162: clone (clone.S:95)
And a couple missed stack buffer initializations here (missed the 'data' initialization) and here (missed the 'data3' initialization): ==856572== Conditional jump or move depends on uninitialised value(s) ==856572== at 0x25E555: get_delegates_online_status (block_verifiers_update_functions.c:1238) ==856572== by 0x224C2E: start_new_round (block_verifiers_functions.c:115) ==856572== by 0x24791E: current_block_height_timer_thread (block_verifiers_thread_server_functions.c:186) ==856572== by 0x497B608: start_thread (pthread_create.c:477) ==856572== by 0x4AD2162: clone (clone.S:95) And here and here: ==900800== Conditional jump or move depends on uninitialised value(s) ==900800== at 0x483EFA9: strlen_sse2 (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==900800== by 0x1E6739: parse_json_data (string_functions.c:57) ==900800== by 0x22418E: server_receive_data_socket_node_to_node (block_verifiers_server_functions.c:415) ==900800== by 0x269E2E: socket_thread (server_functions.c:856) ==900800== by 0x26A923: socket_receive_data_thread (server_functions.c:932) ==900800== by 0x497B608: start_thread (pthread_create.c:477) ==900800== by 0x4AD2162: clone (clone.S:95) And here: ==901150== Conditional jump or move depends on uninitialised value(s) ==901150== at 0x483EFA9: strlen_sse2 (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==901150== by 0x1EA987: get_delegates_total_voters_count (shared_delegate_website_thread_server_functions.c:241) ==901150== by 0x26B5A6: block_verifiers_add_reserve_proof_check_if_data_is_valid (delegate_server_functions.c:146) ==901150== by 0x26C008: server_receive_data_socket_node_to_block_verifiers_add_reserve_proof (delegate_server_functions.c:353) ==901150== by 0x269A32: socket_thread (server_functions.c:753) ==901150== by 0x26A93C: socket_receive_data_thread (server_functions.c:932) ==901150== by 0x497B608: start_thread (pthread_create.c:477) ==901150== by 0x4AD2162: clone (clone.S:95)
Also valgrind doesn't seem to like the individual structure member approach taken here:
I would probably just clear it in one fell swoop:
Expected Behavior Memory should be initialized before referenced.
Current Behavior Some missed initializations could potentially result in wrong file handles being closed. Also that one stack buffer could wind up with random data appended to the string. Maybe...
Possible Solution valgrind seemed happy with a few simple memsets. i.e.:
memset(events,0,sizeof(events)); memset(block_verifiers_send_data_socket,0,sizeof(block_verifiers_send_data_socket)); memset(data,0,sizeof(data)); etc...
I also recommend wrapping the EPOLL_CTL_DEL and the close() calls in a check for a valid socket as such:
Steps to Reproduce (for bugs) run xcash-dpops under valgrind. I used this in my testnet machine /usr/lib/systemd/system/xcash-dpops.service:
Context
Your Environment