Simulation only instructions?

fabianschuiki / llhd

Low Level Hardware Description — A foundation for building hardware design tools.

http://www.llhd.io

Apache License 2.0

390 stars 30 forks source link

Simulation only instructions? #121

Open gameboo opened 4 years ago

gameboo commented 4 years ago

Hello,

I am enjoying gently diving into llhd as hobby during lockdown, and in that sense, my reading of the paper and the information out there might not be as thorough as it could be, so please ignore/close this if I missed something obvious :).

I was wondering to what extend it would be desirable / a big no no to have "simulation only" instructions in llhd. Basically, I am after mapping terminal input / output in a simulation. I tried llhd-sim and realized it produced a vcd file which is nice, but rapidly hard to use as a debugging method for more complex designs. "printf" style debugging could use a simOut instruction to output strings mingled with values from the simulation (backing verilog's $display or $write). A simIn could also map values from outside a simulation to signals. These could help in simulating a UART for example. And they get ignored in all cases that are not simulation.

The paper (section 2.5.8 about memory instructions) mentions "a mechanism to call into native code". Is what I suggest already covered by such things in some way?

Thanks!

fabianschuiki commented 4 years ago

You raise an excellent point. At some point we want input/output from the simulation, as you say, to back Verilog's $display/$write, or VHDL's report. SystemVerilog also has a whole host of other functions for file I/O and the like (e.g. $fscanf, etc.).

Regarding possible ways to implement this, I'm inclined to keep things as minimal as possible. For example:

Rather than providing string interpolation for mingled values as part of the instruction, the interpolation should be implemented by the Verilog compiler directly in LLHD assembly.
Rather than adding a new instruction for this, which adds significantly to the surface area of the language, I would propose declaring an llhd.sim_out and llhd.sim_in intrinsic, which would just be a regular function call.

The mechanism to call into native code is not implemented at the moment, but would work in roughly the same fashion: you declare an "external" function in LLHD, that you can then call. The simulator would try to dynamically link that symbol on startup, e.g. to a sys/libc call. Then you could gain access to native code via DPI, or call into OS functions:

extern func @fprintf(i32, i8*)

func @magic () void {
entry:
    %stdout = const i32 1
    %msg = call i8* %format_a_string ()
    call void @fprintf(i32 %stdout, i8* %msg)
}

Any thoughts on this?

gameboo commented 4 years ago

I think this approach makes a lot of sense.

The paper (section 2.5.8 about memory instructions) mentions "a mechanism to call into native code". Is what I suggest already covered by such things in some way?

Was extern the intended way to achieve this?

Rambling a bit now:

Just to help me settle things in my head: declare expects to tap into other llhd code and extern would tap into native code? Would this be the distinction that would let a tool which is only concerned by hardware and not by simulation ignore the fprintf call? I suppose this call can already be ignored by virtue of returning void...
For inputs, one would probably want to be able link llhd modules somewhat transparently with either some native call for simulation, or some other llhd entity for actual hardware, if that makes sense? What I mean is, using the example from llhd-sim in tests/instances.llhd:
```
; @get_my_inpt is what we want to sometimes be native code, sometimes be just another llhd entity
declare @get_my_inpt () -> (<i4>)
```

entity %combinator (i4$ %A, i4$ %B) (i4$ %C) { %0 = prb %A %1 = prb %B %2 = add i4 %0 %1 drv %C %2 }

entity @top () (i4$ %out) { %count = sig i4 inst @get_my_inpt () (%count) %0 = prb %count %x2 = mul i4 %0 2 %count2 = sig i4 drv %count2 %x2 inst %combinator (%count, %count2) (%out) }


I would want to back `@get_my_input` by either a `scanf` or the `counter` entity that was in the example (that would be defined in its own llhd module), but without the calling context having to particularly be aware of this, if that makes sense... Would this be done by first wrapping a call to an `extern func @scanf() <i4>` in a "simulation only" llhd entity in its own module with the same interface as the `counter` entity? Would tools infer that `counter` is legit to go to a synthesizable netlist and that `scanf_wrapper` is not? Would there be a way to map one of the two behind the `@get_my_inpt` name based on the desired output?

fabianschuiki commented 4 years ago

Was extern the intended way to achieve this?

Yes exactly. There must be a clear separation about what is described inside an LLHD model, and what is an opaque/black-box call into some piece of code that the linker of the operating system gives you access to.

Just to help me settle things in my head: declare expects to tap into other llhd code and extern would tap into native code? Would this be the distinction that would let a tool which is only concerned by hardware and not by simulation ignore the fprintf call?

Exactly. declare would work very much like the LLVM counterpart, and act as a placeholder value for some other LLHD code. All declares are solved when you link two or more LLHD modules together. This allows you to e.g. compile an LLHD netlist, and then link it with e.g. a stdcell simulation library (also in LLHD).

I suppose this call can already be ignored by virtue of returning void...

Indeed, a first step to prepare synthesis would be to remove everything from the LLHD description which does not strictly contribute to an input-to-output relationship.

I would want to back @get_my_input by either a scanf or the counter entity that was in the example (that would be defined in its own llhd module), but without the calling context having to particularly be aware of this, if that makes sense... Would this be done by first wrapping a call to an extern func @scanf() <i4> in a "simulation only" llhd entity in its own module with the same interface as the counter entity? Would tools infer that counter is legit to go to a synthesizable netlist and that scanf_wrapper is not? Would there be a way to map one of the two behind the @get_my_inpt name based on the desired output?

External code can only be functions, by the nature of how regular code executing on the processor works. So if you want to have a module that is either implemented in LLHD, or derives its behaviour from some external software model, you would have to create a wrapper module for the latter, which calls the software model as a function.

A features where certain parts of an LLHD description are only enabled for simulation, and some only for synthesis, is pretty neat. However I'm not yet sure how useful this is going to be in practice, because realistically your testing code is vastly different from your synthesis code, and swapping out a synthesizable module for a simulation model is something you would probably do in a language frontend such as Moore, rather than very far down the IR pipeline.

gameboo commented 4 years ago

Thanks for the clarifications.

You mentioned that

The mechanism to call into native code is not implemented at the moment

Does extern actually currently exist in any, even highly experimental, form?

So if you want to have a module that is either implemented in LLHD, or derives its behaviour from some external software model, you would have to create a wrapper module for the latter, which calls the software model as a function.

I think an explicit wrapper around native calls providing an entity interface that matches that of the synthesizable one would work fine. I am not sure how obvious of a feature that would be, but I suppose what I am after here is for the option to trivially do something similar to passing a different .so or .a when linking things into a simulator or a netlist.

However I'm not yet sure how useful this is going to be in practice, because realistically your testing code is vastly different from your synthesis code

I suppose one flow I like to follow to quickly push ahead on the dev of my RTL looks something like this: Say I want to work on a new feature for my caches. I want to run a workload which needs a framebuffer device to test that feature, but I don't want to waste time on the framebuffer's RTL just yet. I want to run the same RTL I have used for synthesis so far (same CPUs, same (well improved really :)) caches, same interconnects, DMA engines, accelerators, etc...). I simply change maybe the boot memory to instead be a simulated model that can perform IO to a file to read my workload, and a new framebuffer simulated model with calls out to, say, SDL or some native graphics library. I limit the differences in the RTL to leafs of the module hierarchy as much as possible, and focus on the RTL for my new cache feature. I can simulate it and know that it behaves correctly (because surely that's the only way it could behave :) ) in something that is as close to the synthesized RTL as I can. This together with the ability to trace things withing the individual modules via native calls to printf is quite a nice place to be in I think (this is the approach we have been taking to a large extent when using higher level hardware description languages).

For what it's worth, another flow I was contemplating for ages was something like having RTL code only for a DUT and have its interface stimulated from external native code in lieu of a testbench, hence literally having NO testing code in the same source language. I quite liked this approach, but maybe it is very naive?

What do you think about those potential use cases?

jaroslov commented 4 years ago

@gameboo Being able to drive a DUT from a Python testbench — or orchestrating multiple DUTs & native modules in Python would be killer.

fabianschuiki commented 4 years ago

Does extern actually currently exist in any, even highly experimental, form?

Not at the moment. It's easy to add to the IR, but I'm currently pushing Moore ahead to a point where it would be able to emit e.g. DPI calls in SV (function calls currently missing). At that point I will go in and add the extern declarations to LLHD. llhd-sim would then also have to be extended to be able to load *.so files as you suggest, which provide the external functions.

I think an explicit wrapper around native calls providing an entity interface that matches that of the synthesizable one would work fine. I am not sure how obvious of a feature that would be, but I suppose what I am after here is for the option to trivially do something similar to passing a different .so or .a when linking things into a simulator or a netlist.

Yeah I think this flow should be easily possible with this approach!

I suppose one flow I like to follow to quickly push ahead on the dev of my RTL looks something like this: Say I want to work on a new feature for my caches. I want to run a workload which needs a framebuffer device to test that feature, but I don't want to waste time on the framebuffer's RTL just yet. I want to run the same RTL I have used for synthesis so far (same CPUs, same (well improved really :)) caches, same interconnects, DMA engines, accelerators, etc...). I simply change maybe the boot memory to instead be a simulated model that can perform IO to a file to read my workload, and a new framebuffer simulated model with calls out to, say, SDL or some native graphics library. I limit the differences in the RTL to leafs of the module hierarchy as much as possible, and focus on the RTL for my new cache feature. I can simulate it and know that it behaves correctly (because surely that's the only way it could behave :) ) in something that is as close to the synthesized RTL as I can. This together with the ability to trace things withing the individual modules via native calls to printf is quite a nice place to be in I think (this is the approach we have been taking to a large extent when using higher level hardware description languages).

Yeah this makes a lot of sense. I think that would be well-covered by extern calls.

For what it's worth, another flow I was contemplating for ages was something like having RTL code only for a DUT and have its interface stimulated from external native code in lieu of a testbench, hence literally having NO testing code in the same source language. I quite liked this approach, but maybe it is very naive?

What do you think about those potential use cases?

This could work a bit like CocoTB testbenches, where you push everything testing-related out into Python via DPI calls. Pretty exciting stuff!