codes-org / codes

The Co-Design of Exascale Storage Architectures (CODES) simulation framework builds upon the ROSS parallel discrete event simulation engine to provide high-performance simulation utilities and models for building scalable distributed systems simulations
Other
40 stars 16 forks source link

Unsafe absolute RC state change: in_send_loop[ ] - all models #216

Open nmcglo opened 3 years ago

nmcglo commented 3 years ago

Basically every CODES network model does this and is a reminder for me or someone to go through and fix it in all of them.

I've found nondeterminism in Dragonfly Dally if I use small VCs and cause them to overflow. If the VCs are big enough to not overflow, it's deterministic. This happens even with uniform random synthetic.

I have found an unsafe practice in basically every CODES model. Specifically, there are times when we do something like

s->in_send_loop[output_port] = 0
bf->c4 = 1

Then in the reverse handler, we'll check if (bf->c4 == 1), if it is, then we set s->in_send_loop[output_port]=1

Problem, we can't guarantee that s->in_send_loop[output_port]==1 in the forward event handler, but in the reverse handler we force it to become that.

Solution is to just add a new state saving field into the message or to create an RC stack item for it.

This is almost certainly a source of nondeterminism but is an unsafe practice anyway. We must always be aware of making absolute state changes in RC without some sort of state swapping mechanism like storing it in the msg-> or into an RC stack.

nmcglo commented 3 years ago

Not the sole source of nondeterminism in Dragonfly Dally with constrained vc sizes

nmcglo commented 3 years ago

Turns out it wasn't the source at all of what I was observing. It's still a potential problem, however. Renaming to lower sense of urgency.