dsharlet / pthread_trace

Trace pthread events
MIT License
5 stars 1 forks source link

Store events in thread local buffer in simple format, convert to proto when flushing #3

Open dsharlet opened 1 week ago

dsharlet commented 1 week ago

Instead of encoding a protobuf directly into the thread local buffer, we could just record a simple struct of events, and generate the protobuf when flushing to the file.

This would reduce overhead in the tracing functions, but would cause flushes to be slower instead. There are pros and cons to this.

dsharlet commented 1 week ago

I'm not sure this really makes sense. Here's a profile of on a trivial loop of mutex lock, unlock on a single thread:

  56.53%  benchmark  [vdso]                [.] __vdso_clock_gettime                                              
  19.92%  benchmark  libc.so.6             [.] __memmove_avx_unaligned_erms                                      
   5.22%  benchmark  pthread_trace.so      [.] (anonymous namespace)::thread_state::write_end                    
   4.54%  benchmark  libstdc++.so.6.0.30   [.] std::chrono::_V2::system_clock::now                               
   2.80%  benchmark  pthread_trace.so      [.] (anonymous namespace)::thread_state::write_begin_with_delta<2ul, (
   2.17%  benchmark  pthread_trace.so      [.] (anonymous namespace)::thread_state::write_begin<(anonymous namesp
   1.96%  benchmark  ld-linux-x86-64.so.2  [.] __tls_get_addr                                                    
   1.41%  benchmark  libc.so.6             [.] pthread_mutex_lock@@GLIBC_2.2.5                                   
   0.69%  benchmark  libc.so.6             [.] clock_gettime@@GLIBC_2.17                                         
   0.67%  benchmark  libc.so.6             [.] pthread_mutex_unlock@@GLIBC_2.2.5                                 
   0.67%  benchmark  pthread_trace.so      [.] pthread_mutex_unlock                                              
   0.64%  benchmark  libstdc++.so.6.0.30   [.] 0x000000000009eb10                                                
   0.58%  benchmark  pthread_trace.so      [.] pthread_mutex_lock                                                

So it seems like at most a ~30% improvement is on the table. That's probably not worth a lot of added complexity...