Split from #6635 where we'll be producing core-sharded trace files from thread-sharded originals and storing them on disk (as opposed to just analyzing them in a core-sharded schedule #5694).
Adding a new type of trace file adds complexity, and we had originally hoped to always re-schedule the original traces: but there are multiple use cases where having the core schedule permanently on disk is useful. This includes feeding to simulators that are not using our dynamic scheduler.
The plan is to tap into the existing dynamic core-sharded analysis from #5694. We would add a new filetype marking the trace files as core-sharded. The scheduler will have to read ahead to get the filetype: which may cause headaches with input ordinals.
We'll want to update the invariant checker so we can use it on such traces: #6684.
For analyzing core-sharded-on-disk files: my approach is to add a new filetype indicating core-sharded, and have the scheduler read every input to find its filetype record so the analyzer can check the filetype at init time.
That all works fine for typical runs: but it breaks many specific modes, from the readahead:
Breaks online: scheduler init just blocks!
=> new option read_inputs_in_init, turned off for ipc readers.
Breaks inv checks b/c input record ord is too far => using output ord until 1st
instr.
Breaks unit tests which have no filetype records => stopping readahead at
pagesize marker.
Breaks replay-as-traced b/c input record ord is too far => disabling read-ahead
for as-traced.
Breaks record filter legacy null check b/c legacy trace has no filetype and it
reads the timestamp which means the stream's last-timestamp is now >
stop-timestamp and so the resulting output trace has the filter-end marker 1st
thing!
Split from #6635 where we'll be producing core-sharded trace files from thread-sharded originals and storing them on disk (as opposed to just analyzing them in a core-sharded schedule #5694).
Adding a new type of trace file adds complexity, and we had originally hoped to always re-schedule the original traces: but there are multiple use cases where having the core schedule permanently on disk is useful. This includes feeding to simulators that are not using our dynamic scheduler.
The plan is to tap into the existing dynamic core-sharded analysis from #5694. We would add a new filetype marking the trace files as core-sharded. The scheduler will have to read ahead to get the filetype: which may cause headaches with input ordinals.
We'll want to update the invariant checker so we can use it on such traces: #6684.
Pasting from https://github.com/DynamoRIO/dynamorio/issues/6635#issuecomment-1967420976
For analyzing core-sharded-on-disk files: my approach is to add a new filetype indicating core-sharded, and have the scheduler read every input to find its filetype record so the analyzer can check the filetype at init time.
That all works fine for typical runs: but it breaks many specific modes, from the readahead:
Breaks online: scheduler init just blocks! => new option read_inputs_in_init, turned off for ipc readers.
Breaks inv checks b/c input record ord is too far => using output ord until 1st instr.
Breaks unit tests which have no filetype records => stopping readahead at pagesize marker.
Breaks replay-as-traced b/c input record ord is too far => disabling read-ahead for as-traced.
Breaks record filter legacy null check b/c legacy trace has no filetype and it reads the timestamp which means the stream's last-timestamp is now > stop-timestamp and so the resulting output trace has the filter-end marker 1st thing!
=>