Support `sync` without `std_sync_reg`

rachitnigam commented 1 year ago

I realized that the FSMs generated by TDCC are probably sufficient to implement the synchronization that we implement using std_sync_reg. The high-level idea is that each thread in a par block gets its own FSM. A @sync in multiple different threads means that the FSMs of the corresponding threads must synchronize before continuing execution. This can be done by generating a transition condition that waits for all FSMs to reach the corresponding state before allowing any one of them to move forward.

@paili0628 I think this would be a good thing to implement before we write up a paper on this stuff

The PDF document summarizes this idea visually: Sync using FSMs.pdf

sampsyo commented 1 year ago

This is a pretty interesting idea. To state a couple of probably-obvious observations:

"FSM-based synchronization" would be yet a third mode to implement the semantics of @sync. We of course already have @paili0628's implementation based on std_sync_reg, which is in some ways the most "elemental" implementation. We have also discussed a static-timing-based implementation strategy, which actually requires no hardware at all, in the special circumstance where the compiler can prove that enough time has elapsed that the desired ordering is always already enforced. Then this third one works by hooking into the existing sequencing hardware rather than bringing its own. I like that this spectrum is available under the same high-level meaning!
Like the static-timing-based implementation, the proposed implementation also has its limitations (for which it offers its own benefits). I think the limitation here is that the @sync points need to be compiled "together" so they can both talk to the same FSM. Maybe this means they have to be in the same component, or maybe it means they just have to be in the same "translation unit." I realize that we were probably not supporting cross-component or cross-file synchronization anyway, so this could be seen as subsuming the std_sync_reg implementation strategy. But I kinda like the idea of hypothetically supporting separately-compiled synchronizing code using the heavyweight fallback of a synchronizing register.
I think that implementing this would require hacking the TDCC pass. I have no evidence that the following would even be possible, but it would be rad in a fantasy world if there were some way where the TDCC pass could expose a generic interface for stuff like this to plug into. Some kind of in-IR representation for the FSM "hooks" required to do this would be cool. But clearly beyond an MVP.

rachitnigam commented 1 year ago

But I kinda like the idea of hypothetically supporting separately-compiled synchronizing code using the heavyweight fallback of a synchronizing register.

Hm, I think that could be interesting but I'm not sure if it needs to be language-level , that is, if you want to do this kind of synchronization, you probably want to put a FIFO between the two components and perform fine-grained synchronization

rachitnigam commented 1 year ago

it would be rad in a fantasy world if there were some way where the TDCC pass could expose a generic interface

Yeah, I had the exact same thought–is there some way to implement this separately from the TDCC code itself? In this case, it'd require exposing the FSM transitions in some structured manner.

sampsyo commented 1 year ago

Hm, I think that could be interesting but I'm not sure if it needs to be language-level , that is, if you want to do this kind of synchronization, you probably want to put a FIFO between the two components and perform fine-grained synchronization

Fair enough! This kind of facility is purely hypothetical and doesn't really have a use case. :smiley: Just food for thought.

paili0628 commented 1 year ago

Could there be a possible case where we could have problem resetting the barrier for the mechanism proposed?

paili0628 commented 1 year ago

I think there are a couple of things we should pay attention to before doing this:

for the following program:

par {
  // thread A 
  while lt.out {
    @Node_id(1) one; 
    @sync(1); 
    @Node_id(2) two;
   }  -> F1

  // thread B
  while lt.out {
     @Node_id(1) three; 
     @sync(1);
   }  -> F2
}

Assuming that the FSM we generate looks like below (just using the notation in the PDF): s1 <- sync_1.done; s1 = F1 = 1 & F2 = 1

In this case , what would happen could be: At time slot 1: one and three run together 2: thread A and B sync 3: two runs first time, thread B starts second iteration 4: two runs first time, F1.out is still 1, thread B finishes running three, sees s1 is up and starts third iteration

The problem is, if my understanding is somewhat accurate, as we design the FSM, F1.out is always set to 1 while two[done] is not activated. Hence if our sync signal solely relies on the fsm register (in this case F1 and F2), and if two runs for a really long time, and we start another iteration of thread B before two even finishes, we could end up running multiple iterations of thread B while running two for the first time.

The current @sync attribute can only be marked before empty control with last semester's syntax change. So can tdcc handle that if the empty control is not eliminated beforehand?

rachitnigam commented 1 year ago

Hey @paili0628, you can edit the comments by clicking three dots on the top right. The code is really hard to read so can you try reformatting it a bit and explaining what you're expecting will happen event-by-event?

paili0628 commented 1 year ago

@rachitnigam I updated my comment, plz take a look.

rachitnigam commented 1 year ago

I cleaned up the comment a bit more. Usually, you can use the code block syntax (three backticks) and click on the preview link to see what the final comment is going to look like. Take a look at the raw markdown by clicking on the edit button and try to use code blocks with indenting in the future

Couple of things:

The @sync(1) in both the threads need to have their own NODE_ID and therefore their own states in the FSM.
The register that stores the state for checking synchronization need to be reset to 0 in the next cycle after it is set to 1.
TDCC never scheduled the same group in consecutive cycles. Maybe this can be used to guarantee that we never deadlock?

One way to proceed would be to sketch out exactly what a compiled group and corresponding hardware should look like. Once we have some concrete code to stare at, we can decide if it will work or not

calyxir / calyx

Support `sync` without `std_sync_reg` #1333