model-net seemingly causing non-trivial performance overheads (Imported #50)

nmcglo commented 8 years ago

Original Issue Author: Jonathan Jenkins Original Issue ID: 50 Original Issue URL: https://xgitlab.cels.anl.gov/codes/codes/issues/50

We've heard about this a few times now (Ning with the fattree, Misbah with the dragonfly model / MPI replay program, and potentially Misbah/Caitlin with "awesim" runs) - something is causing performance regressions (resulting in less ROSS efficiency) when using model-net vs direct ROSS in optimistic mode.

It's unclear why this is happening for the time being. model-net imposes two extra events (sched-new, sched-next) per model_net_event call, the first of which is a remote from the client (the original "packet event" that the client used to directly send is now a self-event). I would imagine there to be some degree of overhead for this, but nothing that would significantly affect the rollback rate / ROSS efficiency...

We should keep this in the back of our minds while we are working on other things.

nmcglo commented 8 years ago

Jonathan Jenkins:

I reduced the event overhead by an event in the case of empty queues (happens more often than one would think...). See #81. Not sure if that's enough to solve the underlying problem though...

nmcglo commented 8 years ago

Jonathan Jenkins:

The dragonfly has been optimized a good amount - a lesser number of events internally are being issued, and the event processing logic does not result in building up the event queue arbitrarily any more. Once the optimizations are also applied to the torus, we'll close this ticket.

nmcglo commented 4 years ago

Ticket wasn't actually closed on Jan 14, 2016. Closing now.

codes-org / codes

model-net seemingly causing non-trivial performance overheads (Imported #50) #50