DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.57k stars 551 forks source link

tool.record_filter_bycore_multi failured due to inaccurate pi estimation #6755

Open derekbruening opened 3 months ago

derekbruening commented 3 months ago

The tool.record_filter_bycore_multi test failed on a64 sve at https://github.com/DynamoRIO/dynamorio/actions/runs/8546804195/job/23417839931?pr=6754

The output mismatch looks like it's b/c it estimated pi at 3.94... instead of 3.14...

2024-04-03T23:32:41.7151484Z 309: Running postcmd4 |/opt/actions-runner/_work/dynamorio/dynamorio/build/build_debug-internal-64/clients/bin64/drcachesim;-indir;tool.record_filter_bycore_multi.filtered.dir;-simulator_type;schedule_stats|
2024-04-03T23:32:41.7153291Z 309: CMake Error at /opt/actions-runner/_work/dynamorio/dynamorio/suite/tests/runmulti.cmake:138 (message):
2024-04-03T23:32:41.7154282Z 309:   output |Estimation of pi is 3.943208188861861
2024-04-03T23:32:41.7154698Z 309: 
2024-04-03T23:32:41.7154952Z 309:   Trace invariant checks passed
2024-04-03T23:32:41.7155312Z 309: 
2024-04-03T23:32:41.7155609Z 309:   Output 2098798 entries from 2098250 entries.
2024-04-03T23:32:41.7156026Z 309: 
2024-04-03T23:32:41.7156271Z 309:   Schedule stats tool results:
2024-04-03T23:32:41.7156619Z 309: 
2024-04-03T23:32:41.7156846Z 309:   Total counts:
2024-04-03T23:32:41.7157122Z 309: 
2024-04-03T23:32:41.7157346Z 309:              3 cores
2024-04-03T23:32:41.7157645Z 309:              3 threads
2024-04-03T23:32:41.7157968Z 309:         215938 instructions
2024-04-03T23:32:41.7158346Z 309:              9 total context switches
2024-04-03T23:32:41.7158880Z 309:      0.0416786 CSPKI (context switches per 1000 instructions)
2024-04-03T23:32:41.7159455Z 309:          23993 instructions per context switch
2024-04-03T23:32:41.7160100Z 309:              9 voluntary context switches
2024-04-03T23:32:41.7160551Z 309:              0 direct context switches
2024-04-03T23:32:41.7160965Z 309:         100.00% voluntary switches
2024-04-03T23:32:41.7161356Z 309:           0.00% direct switches
2024-04-03T23:32:41.7161724Z 309:             71 system calls
2024-04-03T23:32:41.7162279Z 309:             14 maybe-blocking system calls
2024-04-03T23:32:41.7162734Z 309:              0 direct switch requests
2024-04-03T23:32:41.7163312Z 309:              0 waits
2024-04-03T23:32:41.7163637Z 309:        1750080 idles
2024-04-03T23:32:41.7163998Z 309:          10.98% cpu busy by record count
2024-04-03T23:32:41.7164414Z 309:         558629 cpu microseconds
2024-04-03T23:32:41.7164792Z 309:         874238 idle microseconds
2024-04-03T23:32:41.7165222Z 309:         110044 idle microseconds at last instr
2024-04-03T23:32:41.7165789Z 309:          38.99% cpu busy by time
2024-04-03T23:32:41.7166294Z 309:          83.54% cpu busy by time, ignoring idle past last instr
2024-04-03T23:32:41.7166776Z 309: 
2024-04-03T23:32:41.7166996Z 309:   Core #0 counts:
2024-04-03T23:32:41.7167271Z 309: 
2024-04-03T23:32:41.7167493Z 309:              2 threads
2024-04-03T23:32:41.7167817Z 309:          10910 instructions
2024-04-03T23:32:41.7168191Z 309:              4 total context switches
2024-04-03T23:32:41.7168715Z 309:      0.3666361 CSPKI (context switches per 1000 instructions)
2024-04-03T23:32:41.7169280Z 309:           2728 instructions per context switch
2024-04-03T23:32:41.7169759Z 309:              4 voluntary context switches
2024-04-03T23:32:41.7170200Z 309:              0 direct context switches
2024-04-03T23:32:41.7170619Z 309:         100.00% voluntary switches
2024-04-03T23:32:41.7171004Z 309:           0.00% direct switches
2024-04-03T23:32:41.7171367Z 309:             19 system calls
2024-04-03T23:32:41.7171895Z 309:              3 maybe-blocking system calls
2024-04-03T23:32:41.7172344Z 309:              0 direct switch requests
2024-04-03T23:32:41.7172728Z 309:              0 waits
2024-04-03T23:32:41.7173032Z 309:         720111 idles
2024-04-03T23:32:41.7173375Z 309:           1.49% cpu busy by record count
2024-04-03T23:32:41.7173790Z 309:         558629 cpu microseconds
2024-04-03T23:32:41.7174171Z 309:          18290 idle microseconds
2024-04-03T23:32:41.7174602Z 309:          18290 idle microseconds at last instr
2024-04-03T23:32:41.7175176Z 309:          96.83% cpu busy by time
2024-04-03T23:32:41.7175699Z 309:          96.83% cpu busy by time, ignoring idle past last instr
2024-04-03T23:32:41.7176179Z 309: 
2024-04-03T23:32:41.7176406Z 309:   Core #1 counts:
2024-04-03T23:32:41.7176681Z 309: 
2024-04-03T23:32:41.7176906Z 309:              2 threads
2024-04-03T23:32:41.7177225Z 309:         169283 instructions
2024-04-03T23:32:41.7177597Z 309:              2 total context switches
2024-04-03T23:32:41.7178126Z 309:      0.0118145 CSPKI (context switches per 1000 instructions)
2024-04-03T23:32:41.7178687Z 309:          84642 instructions per context switch
2024-04-03T23:32:41.7179165Z 309:              2 voluntary context switches
2024-04-03T23:32:41.7179702Z 309:              0 direct context switches
2024-04-03T23:32:41.7180111Z 309:         100.00% voluntary switches
2024-04-03T23:32:41.7180497Z 309:           0.00% direct switches
2024-04-03T23:32:41.7180865Z 309:             36 system calls
2024-04-03T23:32:41.7181385Z 309:              6 maybe-blocking system calls
2024-04-03T23:32:41.7181830Z 309:              0 direct switch requests
2024-04-03T23:32:41.7182213Z 309:              0 waits
2024-04-03T23:32:41.7182515Z 309:         463025 idles
2024-04-03T23:32:41.7182860Z 309:          26.77% cpu busy by record count
2024-04-03T23:32:41.7183274Z 309:              0 cpu microseconds
2024-04-03T23:32:41.7183650Z 309:         613649 idle microseconds
2024-04-03T23:32:41.7184076Z 309:          33267 idle microseconds at last instr
2024-04-03T23:32:41.7184526Z 309:           0.00% cpu busy by time
2024-04-03T23:32:41.7185027Z 309:           0.00% cpu busy by time, ignoring idle past last instr
2024-04-03T23:32:41.7185505Z 309: 
2024-04-03T23:32:41.7185728Z 309:   Core #2 counts:
2024-04-03T23:32:41.7185997Z 309: 
2024-04-03T23:32:41.7186222Z 309:              1 threads
2024-04-03T23:32:41.7186544Z 309:          35745 instructions
2024-04-03T23:32:41.7186916Z 309:              3 total context switches
2024-04-03T23:32:41.7187445Z 309:      0.0839278 CSPKI (context switches per 1000 instructions)
2024-04-03T23:32:41.7188129Z 309:          11915 instructions per context switch
2024-04-03T23:32:41.7188615Z 309:              3 voluntary context switches
2024-04-03T23:32:41.7189058Z 309:              0 direct context switches
2024-04-03T23:32:41.7189471Z 309:         100.00% voluntary switches
2024-04-03T23:32:41.7189860Z 309:           0.00% direct switches
2024-04-03T23:32:41.7190323Z 309:             16 system calls
2024-04-03T23:32:41.7190947Z 309:              5 maybe-blocking system calls
2024-04-03T23:32:41.7191400Z 309:              0 direct switch requests
2024-04-03T23:32:41.7191785Z 309:              0 waits
2024-04-03T23:32:41.7192087Z 309:         566944 idles
2024-04-03T23:32:41.7192432Z 309:           5.93% cpu busy by record count
2024-04-03T23:32:41.7192841Z 309:              0 cpu microseconds
2024-04-03T23:32:41.7193219Z 309:         242299 idle microseconds
2024-04-03T23:32:41.7193655Z 309:          58487 idle microseconds at last instr
2024-04-03T23:32:41.7194096Z 309:           0.00% cpu busy by time
2024-04-03T23:32:41.7194598Z 309:           0.00% cpu busy by time, ignoring idle past last instr
2024-04-03T23:32:41.7195074Z 309: 
2024-04-03T23:32:41.7195313Z 309:   Core #0 schedule: S_O_O_O_S
2024-04-03T23:32:41.7195653Z 309: 
2024-04-03T23:32:41.7195894Z 309:   Core #1 schedule: T_O_O_
2024-04-03T23:32:41.7196217Z 309: 
2024-04-03T23:32:41.7196457Z 309:   Core #2 schedule: O_O_O_O_
2024-04-03T23:32:41.7196788Z 309: 
2024-04-03T23:32:41.7197187Z 309:   | failed to match expected output |Estimation of pi is 3.14.*
2024-04-03T23:32:41.7197691Z 309: 
2024-04-03T23:32:41.7197947Z 309:   Trace invariant checks passed
2024-04-03T23:32:41.7198308Z 309: 
2024-04-03T23:32:41.7198566Z 309:   Output .* entries from .* entries.
2024-04-03T23:32:41.7198938Z 309: 
2024-04-03T23:32:41.7199190Z 309:   Schedule stats tool results:
2024-04-03T23:32:41.7199536Z 309: 
2024-04-03T23:32:41.7280155Z 309:   .*
2024-04-03T23:32:41.7280612Z 309: 
2024-04-03T23:32:41.7281194Z 309:   Core #0 schedule: .*
2024-04-03T23:32:41.7281607Z 309: 
2024-04-03T23:32:41.7281861Z 309:   Core #1 schedule: .*
2024-04-03T23:32:41.7282169Z 309: 
2024-04-03T23:32:41.7282397Z 309:   Core #2 schedule: .*
2024-04-03T23:32:41.7282697Z 309: 
2024-04-03T23:32:41.7282907Z 309:   |
2024-04-03T23:32:41.7283118Z 309: 
2024-04-03T23:32:41.7283319Z 309: 
2024-04-03T23:32:41.7283894Z 310/377 Test #309: code_api|tool.record_filter_bycore_multi .........................***Failed    8.49 sec
derekbruening commented 3 months ago

This passed on a re-run. It passes every time on the non-sve machine (don't have simple direct access to the sve machine).

abhinav92003 commented 3 months ago

This issue happens only on sve it seems, but it is similar to a recent non-sve issue that was observed on signalNNN1 tests; where the result of some trig math floating-point ops changed and became non-deterministic after PR #6725.

Since #6725 adjusted some sve and non-sve logic too, maybe both of these are indeed related.

abhinav92003 commented 3 months ago

PR #6758 may help with this issue. I haven't used tmate to verify the failure count post that PR though.