Closed rachitnigam closed 1 year ago
This is a pretty interesting idea. To state a couple of probably-obvious observations:
@sync
. We of course already have @paili0628's implementation based on std_sync_reg
, which is in some ways the most "elemental" implementation. We have also discussed a static-timing-based implementation strategy, which actually requires no hardware at all, in the special circumstance where the compiler can prove that enough time has elapsed that the desired ordering is always already enforced. Then this third one works by hooking into the existing sequencing hardware rather than bringing its own. I like that this spectrum is available under the same high-level meaning!@sync
points need to be compiled "together" so they can both talk to the same FSM. Maybe this means they have to be in the same component, or maybe it means they just have to be in the same "translation unit." I realize that we were probably not supporting cross-component or cross-file synchronization anyway, so this could be seen as subsuming the std_sync_reg
implementation strategy. But I kinda like the idea of hypothetically supporting separately-compiled synchronizing code using the heavyweight fallback of a synchronizing register.But I kinda like the idea of hypothetically supporting separately-compiled synchronizing code using the heavyweight fallback of a synchronizing register.
Hm, I think that could be interesting but I'm not sure if it needs to be language-level , that is, if you want to do this kind of synchronization, you probably want to put a FIFO between the two components and perform fine-grained synchronization
it would be rad in a fantasy world if there were some way where the TDCC pass could expose a generic interface
Yeah, I had the exact same thought–is there some way to implement this separately from the TDCC code itself? In this case, it'd require exposing the FSM transitions in some structured manner.
Hm, I think that could be interesting but I'm not sure if it needs to be language-level , that is, if you want to do this kind of synchronization, you probably want to put a FIFO between the two components and perform fine-grained synchronization
Fair enough! This kind of facility is purely hypothetical and doesn't really have a use case. :smiley: Just food for thought.
Could there be a possible case where we could have problem resetting the barrier for the mechanism proposed?
I think there are a couple of things we should pay attention to before doing this:
par {
// thread A
while lt.out {
@Node_id(1) one;
@sync(1);
@Node_id(2) two;
} -> F1
// thread B
while lt.out {
@Node_id(1) three;
@sync(1);
} -> F2
}
Assuming that the FSM we generate looks like below (just using the notation in the PDF): s1 <- sync_1.done; s1 = F1 = 1 & F2 = 1
In this case , what would happen could be:
At time slot 1:
one
and three
run together
2:
thread A and B sync
3:
two
runs first time, thread B starts second iteration
4:
two
runs first time, F1.out is still 1, thread B finishes running three
, sees s1 is up and starts third iteration
The problem is, if my understanding is somewhat accurate, as we design the FSM, F1.out is always set to 1 while two[done]
is not activated. Hence if our sync signal solely relies on the fsm register (in this case F1 and F2), and if two
runs for a really long time, and we start another iteration of thread B before two
even finishes, we could end up running multiple iterations of thread B while running two
for the first time.
Hey @paili0628, you can edit the comments by clicking three dots on the top right. The code is really hard to read so can you try reformatting it a bit and explaining what you're expecting will happen event-by-event?
@rachitnigam I updated my comment, plz take a look.
I cleaned up the comment a bit more. Usually, you can use the code block syntax (three backticks) and click on the preview link to see what the final comment is going to look like. Take a look at the raw markdown by clicking on the edit button and try to use code blocks with indenting in the future
Couple of things:
@sync(1)
in both the threads need to have their own NODE_ID
and therefore their own states in the FSM.One way to proceed would be to sketch out exactly what a compiled group and corresponding hardware should look like. Once we have some concrete code to stare at, we can decide if it will work or not
I realized that the FSMs generated by TDCC are probably sufficient to implement the synchronization that we implement using
std_sync_reg
. The high-level idea is that each thread in apar
block gets its own FSM. A@sync
in multiple different threads means that the FSMs of the corresponding threads must synchronize before continuing execution. This can be done by generating a transition condition that waits for all FSMs to reach the corresponding state before allowing any one of them to move forward.@paili0628 I think this would be a good thing to implement before we write up a paper on this stuff
The PDF document summarizes this idea visually: Sync using FSMs.pdf