alexforencich / verilog-cam

Verilog Content Addressable Memory Module
MIT License
101 stars 48 forks source link

Write Operation Cycles #2

Open Hrayo712 opened 5 years ago

Hrayo712 commented 5 years ago

Hello!

Hope you're doing well, and sorry to bother you. I am currently using your design for one my implementations. However, I require that the write operation last less than 4 cycles (2 if possible). I was wondering if you could give me a pointer on how could I achieve this ?

Thanks!

alexforencich commented 5 years ago

Well, you're not going to get the SRL version below 16 or 32 cycles. You might be able to make this work with the BRAM version, but I think this will require a significant amount of re-working. The BRAM based CAM update requires two read-modify-write operations to clear the match bits for the old value and then set them for the new value. With only one port available for updates, four operations means at least four clock cycles. With one set of BRAMs, one port is used for matching and one port is used for updating. It may be possible to pipeline the current implementation and get it to a throughput of one update every four clock cycles. If you add a second set of BRAMs to "shadow" the first set but don't use that set for matching, then you can use the second port on those instances for the reads and then tie the write ports together so both sets of BRAM would have the same contents. This should enable pipelined operations with a throughput of one operation every two cycles (two reads and two writes per update, on separate ports). The latency will probably be at least four cycles, though, as there needs to be a read against the 'previous value' RAM as well as wait states for the BRAM output registers. You'll also need to add hazard detection logic to make sure concurrent read-modify-write operations to the same address are handled correctly.

Now, if you're going to be doing a lot of updates at the same location in the CAM, then you can design the pipeline logic to 'merge' operations against the same address and get either one operation per two cycles with one set of BRAMs or one operation per cycle with two sets of BRAMs. This is probably not the case, though.

If you really need the fastest possible updates, then just implement your CAM trivially on normal logic, and you can get one update per clock cycle, no problem.