apache / trafficserver

Apache Traffic Server™ is a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
https://trafficserver.apache.org/
Apache License 2.0
1.81k stars 800 forks source link

Replace ink_atomic with std::atomic #9173

Open masaori335 opened 1 year ago

masaori335 commented 1 year ago

Last few days, I've been running ThreadSanitizer. Some reports has the same pattern of data race, atomic write v.s. normal read.

The ink_atomic operations are used for writing variables but not for reading variables. It looks like this happens to almost all of the variables that are touched by ink_atomic operations. We need to change these variables with std::atomic to make reading safe.

Also, replacing with std::atomic provides an advantage. We can control memory order. This is not available with ink_atomic because it's using legacy gcc buit-in __sync functions, I think.

WARNING: ThreadSanitizer: data race (pid=41517)
  Atomic write of size 8 at 0x00000a05bb90 by main thread (mutexes: write M0, write M1, write M2, write M3, write M4, write M5, write M6, write M7, write M8, write M9, write M10, write M11, write M12, write M13):
    #0 ink_atomiclist_push ink_queue.cc:538 (libtscore.10.dylib:x86_64+0x4522d) (BuildId: ddb6086f2f64376aae6ba04a59161d1132000000200000000100000000000c00)
    #1 ProtectedQueue::enqueue(Event*) ProtectedQueue.cc:52 (traffic_server:x86_64+0x51c12d) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #2 EventProcessor::schedule(Event*, int) P_UnixEventProcessor.h:123 (traffic_server:x86_64+0x322d9d) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #3 RecProcessStart() RecProcess.cc:277 (traffic_server:x86_64+0x517e26) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #4 main traffic_server.cc:2072 (traffic_server:x86_64+0x6cef8) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)

  Previous read of size 8 at 0x00000a05bb90 by thread T18 (mutexes: write M14):
    #0 ProtectedQueue::wait(long long) ProtectedQueue.cc:94 (traffic_server:x86_64+0x51c435) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #1 EThread::DefaultTailHandler::waitForActivity(long long) I_EThread.h:369 (traffic_server:x86_64+0x521182) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #2 EThread::execute_regular() UnixEThread.cc:288 (traffic_server:x86_64+0x51fe17) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #3 EThread::execute() UnixEThread.cc:349 (traffic_server:x86_64+0x5202e9) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #4 spawn_thread_internal(void*) Thread.cc:79 (traffic_server:x86_64+0x51d488) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)

  Location is heap block of size 1101120 at 0x000009f5b000 allocated by main thread:
    #0 operator new(unsigned long) <null>:28420041 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x652ad) (BuildId: 065b6c81cd87343dafd65694f305ecb72400000010000000000a0a0000030c00)
    #1 EventProcessor::spawn_event_threads(int, int, unsigned long) UnixEventProcessor.cc:421 (traffic_server:x86_64+0x52494b) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #2 TasksProcessor::start(int, unsigned long) Tasks.cc:42 (traffic_server:x86_64+0x51cac0) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #3 main traffic_server.cc:2070 (traffic_server:x86_64+0x6cef3) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
WARNING: ThreadSanitizer: data race (pid=42272)
  Atomic write of size 8 at 0x000001768418 by thread T30:
    #0 RecIncrGlobalRawStatSum(RecRawStatBlock*, int, long long) RecRawStats.cc:439 (traffic_server:x86_64+0x514826) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #1 NetAccept::do_blocking_accept(EThread*) UnixNetAccept.cc:335 (traffic_server:x86_64+0x4d93d0) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #2 NetAccept::acceptLoopEvent(int, Event*) UnixNetAccept.cc:551 (traffic_server:x86_64+0x4d833a) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #3 EThread::execute() UnixEThread.cc:333 (traffic_server:x86_64+0x520124) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #4 spawn_thread_internal(void*) Thread.cc:79 (traffic_server:x86_64+0x51d488) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)

  Previous read of size 8 at 0x000001768418 by thread T19 (mutexes: write M0, write M1, write M2):
    #0 RecRawStatSyncSum(char const*, RecDataT, RecData*, RecRawStatBlock*, int) RecRawStats.cc:301 (traffic_server:x86_64+0x512bf8) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #1 RecExecRawStatSyncCbs() RecRawStats.cc:563 (traffic_server:x86_64+0x514f57) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #2 raw_stat_sync_cont::exec_callbacks(int, Event*) RecProcess.cc:146 (traffic_server:x86_64+0x518a57) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #3 EThread::process_event(Event*, int) UnixEThread.cc:153 (traffic_server:x86_64+0x51e8fa) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #4 EThread::execute_regular() UnixEThread.cc:258 (traffic_server:x86_64+0x51f8af) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #5 EThread::execute() UnixEThread.cc:349 (traffic_server:x86_64+0x5202e9) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #6 spawn_thread_internal(void*) Thread.cc:79 (traffic_server:x86_64+0x51d488) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)

  Location is heap block of size 520000 at 0x000001736000 allocated by main thread:
    #0 malloc <null>:28420041 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x3732c) (BuildId: 065b6c81cd87343dafd65694f305ecb72400000010000000000a0a0000030c00)
    #1 ats_malloc ink_memory.cc:64 (libtscore.10.dylib:x86_64+0x4381f) (BuildId: ddb6086f2f64376aae6ba04a59161d1132000000200000000100000000000c00)
    #2 RecCoreInit(RecModeT, Diags*) RecCore.cc:206 (traffic_server:x86_64+0x503e76) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #3 RecProcessInit(RecModeT, Diags*) RecProcess.cc:214 (traffic_server:x86_64+0x51742d) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)
    #4 main traffic_server.cc:1783 (traffic_server:x86_64+0x6afa3) (BuildId: 279497f9a0ac37109783e3c337bc1aa632000000200000000100000000000c00)

Actually, Phil tried to replace ink_atomic with std::atomic by PR #2633 5 years ago.

SolidWallOfCode commented 1 year ago

We should use std::atomic generally, because it's standard. The current atomic support dates from before any of this was standard.

bneradt commented 1 year ago

Just to record this: in discussing this in our ASF PR/issue meeting, we suggest that this be addressed piecemeal as patches are done involving the old ink_atomic code.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.