UCSBarchlab / PyRTL

A collection of classes providing simple hardware specification, simulation, tracing, and testing suitable for teaching and research. Simplicity, usability, clarity, and extensibility are the overarching goals, rather than performance or optimization.
http://ucsbarchlab.github.io/PyRTL
BSD 3-Clause "New" or "Revised" License
257 stars 78 forks source link

Synchronous/Async memories #93

Closed timsherwood closed 9 years ago

timsherwood commented 9 years ago

Need to modify the memories to support both sync and async memories. Sync memories will only be addressable directly by a register.

timsherwood commented 9 years ago

I was originally thinking that we could implement a "syncmem" as only being addressed by a register. Thus something like this would be legal:

mem = SyncMemBlock(...)
r = Register(...)
x = mem[r]

That should be a sufficient requirement to be synthesized as a synchronous memory. However it might be overly restrictive. If you do this:

mem = SyncMemBlock(...)
r = Register(...)
w = WireVector()
w <<= r
x = mem[w]

It would say that it is not a legal SyncMemBlock (since it is addressed with a wirevector). However, no logic stands between the register and the memory and thus (I believe) synthesis from verilog would be find turning this into a SyncMem.

Options: 1) Be conservative and require the address be a Register. Note that the register value can be passed around through functions just fine, it is only when you do assignments through WireVectors that it is in issue. 2) Add a deep new assertion (a dataflow analysis) into core to check that it flow directly between a register a memory. This would be more intellectually interesting (good for papers), but we would need to verify that the generated verilog actually would produce syncmems (which takes people-time) 3) Add dependent types to python? :)

I am leaning towards (1) above to start since it is easy and will let us get stuff done ASAP, with the idea that if it proves to be to onerous we could revisit in the future.

Gamrix commented 9 years ago

I agree with option 1. I think the above example (and variations of it) are the only place where we would have this issue. Going with option 1 will not result in meaningful loss of functionality. Also, with a good error message, I think it should be easy for the user to resolve the issue. Also, as you said, we could revisit it in the future if this becomes a problem. (it is easy to loosen requirements, not easy to tighten)

jemcmahan13 commented 9 years ago

Is there a reason you're not considering making the register implicit? If SyncMemBlock exclusively appears in the pattern you indicated (register a value, use that as address), then you could have it addressed in the same way as an async MemBlock, but registers are automatically generated to hold the address and data. There isn't much of a sacrifice in clarity, because the user explicitly chooses a SyncMemBlock instead of a MemBlock. It actually makes the separation cleaner: there's a semantic difference in the two block types, not just an added restriction on how you use it.

timsherwood commented 9 years ago

That is a good question. One reason I moved away from it is because I really think of memories as async. When you write mem[foo] it naturally means to me, take the value of foo and use it to index mem immediately (not after one cycle). Often I want to go back and look at the design to figure out if I can move logic around to make it a syncmem instead. In this way I can design something that works, and then change the memories to "syncmems" to help me check that I did it right. The smaller semantic difference both helps and hurts.

The other reason I did it this way is simply because I have not seen anyone else try to do it this way -- so I wan't to see what would come from it... answer thus far is I don't know. :)

Gamrix commented 9 years ago

Sherwood posted (in the wrong thread):

One option I am now considering (after reading this again) is "why not both":

m = MemBlock(addrwidth=2, bitwidth=3, checksync=True) -- like a normal memblock, but checks that addresses are registers

m = SyncMemBlock(addrwidth=2, bitwidth=3) -- adds "register" functionality in automatically?

Gamrix commented 9 years ago

Eh, I'm not so sure about that:

There should be one-- and preferably only one --obvious way to do it. https://www.python.org/dev/peps/pep-0020/

I don't see the big benefit from having two different (but almost the same) methods of doing the same thing. It'll mean that there will be more code to maintain, and it doesn't confer any particular benefit.

I am slightly for SyncMemBlock class structure (seems less arbitrary), though I would still like the register item being in the logic block underneath. I really don't like checksync, that is much more messy than a new class (even though I would rather have fewer classes)

jemcmahan13 commented 9 years ago

MemBlock always being asynchronous and SyncMem having an implicit register seems like the best set up. There is exactly one, clear way to do each thing, and it does not introduce new checks or errors for misuse of SyncMems. I empathize with the desire to have mem[foo] always be asynchronous, but that desire might be coming from the software world, where one sees it as an array access. On your point about changing a design to be synchronous, I'm not sure why adding a register in front of a MemBlock is preferable to changing MemBlock to SynMem.

jemcmahan13 commented 9 years ago

Now that I've said that, I think the "automated register detection and optional synchronous memory synthesis" is a cool HDL design exploration. Whether or not "interesting" is enough motivation to be folded into main pyrtl is a decision above my pay grade.

timsherwood commented 9 years ago

Hmmm... If we want to be scientific I think we need to design some things both ways and see what happens. I could make the default behavior for MemBlock to generate a memory that checks that it is indexed by a register (and throws PyrtlError if it is not). A flag would be available to turn off the check (and thus indicate that it will likely synthesize as a async memory). While there are some very good arguments agains laid out above, here are my refined arguments for the "unified" approach:

That said, a big point against is:

I we include the "checker", and people still really want SyncMem I could be convinced to have a (MemBlock and SyncMemBlock).

timsherwood commented 9 years ago

If we do abandon the "checker" approach then I think we need to move to MemBlock which is then sync, and AsyncMemBlock which makes it clear that sync is the preferred embodiment. Varun (now a professional hardware slinger) says this: "it is best practice to always make sure your block memory/fifos read/write operations start on a clock edge."

jemcmahan13 commented 9 years ago

I think that would be a good change regardless of which approach we go with. Even if we do use the checker, the only way to disable the check should be to use an AsyncMemBlock.

timsherwood commented 9 years ago

Here is an interesting case from our unit tests. Addr is always ready on the clock edge, but there is not a clean way to refactor the register into the memory. Likewise the simple "check if input is register" also fails because the input is a "selection" of the register.

    def test_basic_true_condition_memread(self):
        m = pyrtl.MemBlock(addrwidth=2, bitwidth=3, name='m')
        i = pyrtl.Register(bitwidth=3, name='i')
        o = pyrtl.WireVector(bitwidth=2, name='o')
        i.next <<= i + 1
        addr = i[0:2]
        with pyrtl.ConditionalUpdate() as condition:
            with condition(i < 4):
                m[addr] |= i
            with condition.fallthrough:
                with condition(addr[0]):
                    # this should happen every time because no
                    # state is being updated!
                    o <<= m[addr]
        self.check_trace('i 01234567\no 00000123\n')
Gamrix commented 9 years ago

Oh damm. I didn't think of slices. Looks like we need to make a function that can see through those ops ( and the concat one as well, which will be critical to making post synth syncmem working)

timsherwood commented 9 years ago

Okay, the above problem is fixed -- I added a true dataflow analysis to sanity check that will allow things like concat, split, and wire but not any actual logic operations in the memory index. It's actually pretty neat how easy it was (although now that I think about it this could be done even faster and easier with toposort!). Oh well, it only has to happen once. The code is as below:

        # propagate "async" behavior through the network
        while async_set_is_growing:
            for net in self.logic:
                if net.op in async_prop:
                    if any([w in async_set for w in net.args]):
                        async_set.update(net.dests)
            async_set_is_growing = True if len(async_set) > last_async_set_len else False
            last_async_set_len = len(async_set)

        # now do the actual check on each memory operation
        for net in self.logic:
            if net.op == 'm':
                is_async = net.args[0] in async_set
                if is_async and not net.op_param[1].asynchronous:
                    raise PyrtlError(
                        'memory "%s" is not specified as asynchronous but has and index '
                        '"%s" that is not ready at the start of the cycle'
                        % (net.op_param[1].name, net.args[0].name))

So how do you turn such a check off? Well you declare your memory as: MemBlock(bitwidth, addrwidth, asychronous=True) The asynchronous flag is set to False by default. Interestingly pretty much 99% of the unit tests were synchronous already and required no refactoring! I tried to make the flag very clear and long enough to be obnoxious to encourage sync mems. The only one that did not pass had an "and" operation in the index, and I went ahead and marked it as async. This approach is not set in stone, but I want to try it out, and since I did all the coding I don't feel too bad :)

Gamrix commented 9 years ago

Have you checked that this code creates is interpreted correctly by Verilog?