Simulation performance degradation on Amaranth 0.5.x

amaranth-lang / amaranth

A modern hardware definition language and toolchain based on Python

https://amaranth-lang.org/docs/amaranth/

BSD 2-Clause "Simplified" License

1.59k stars 175 forks source link

Simulation performance degradation on Amaranth 0.5.x #1541

Open tilk opened 2 weeks ago

tilk commented 2 weeks ago

I'm in the process of updating the Coreblocks codebase to use Amaranth 0.5.x releases. Unfortunately, I discovered a severe simulation performance regression. The commit https://github.com/amaranth-lang/amaranth/commit/7870eb344b143dbc3b4f1a03b0a1e178196b8ef8 causes tests to run multiple times slower. For example, a test which took 2.5 s now takes 18.5 s. I didn't find which exact change did that, I will try to find it by selectively reverting single changes from that commit, but it might be hard because this commit combines refactors with code changes.

whitequark commented 2 weeks ago

Thanks for the report! That's unexpected to say the least; that commit certainly wasn't intended to bring performance changes.

To try and save you some time, I did another quick review pass just now, but I didn't see anything suspicious. It might make sense to start with changes to add_clock() though.

whitequark commented 2 weeks ago

it might be hard because this commit combines refactors with code changes.

I normally avoid this, but I find it that when writing documentation, there are many opportunities to restructure code to make sure it is aligned with the documentation and reads better when you're going through a file top-to-bottom. It does unfortunately sometimes come at a cost of causing issues like this. I haven't found a better way to do this; applying lots of small changes makes the workflow almost intractable due to excessive rebase conflicts.

tilk commented 2 weeks ago

This turned out to not to be a bug, but rather an unexpected change of behavior. We used Simulator.run_until to implement timeouts in tests. The previous default behavior (run_passive == False) was that when all the (active testbench) processes stop, the simulation stops too. The current behavior is to run the test until the given time, even if all processes finished. This change of behavior means that now all our tests run to the timeout.

Is there a better way to implement a timeout in the simulator?

whitequark commented 2 weeks ago

The previous default behavior (run_passive == False) was that when all the (active testbench) processes stop, the simulation stops too.

Ah, yes. I think this should be in the changelog. While documenting the simulator I realized that all of the combinations of behaviors you could request from it were quite confusing and difficult to teach, on top of being (at the time) undocumented.

Is there a better way to implement a timeout in the simulator?

I would do something like this:

def add_timeout(sim, interval):
    async def timeout_testbench(ctx):
        await ctx.delay(interval)
        raise Exception(f"timeout after {interval}")

    sim.add_testbench(timeout_testbench, background=True)

... combined with using sim.run().

tilk commented 2 weeks ago

I'll do something like that. Thanks.

And I'll use this opportunity to thank you and your team for the great work. Thanks to the various improvements, some of our code could be removed as redundant. Porting tests was a lot of work, but I think that they were improved in the process, and writing new ones should be easier, too.

whitequark commented 2 weeks ago

Thank you for the praise, I'm really glad your project benefited from this work. The new simulator interface is, I think, well thought out and documented and I don't expect any breaking changes from now on.

On the roadmap we have a rework of the clock domain / control inserter system that should bring it in line with the level of quality and thoroughness in the rest of Amaranth. The way in which we plan to do this rework will likely benefit Coreblocks as well. In short, we want to desugar modules, submodules, and some control constructs into a tree of "scopes", where each scope has an environment mapping domain names to their clock, reset, and enable signals. This makes the system more regular and will also make certain things well-defined where they currently aren't (e.g. the interaction between ResetSignal and ResetInserter/EnableInserter is sketchy at best).

whitequark commented 2 weeks ago

Oh, I have something else in the queue, but I'm not sure how useful it would be: a VS Code extension that provides things like being able to pick a variable from a hierarchy tree, watch its value, go back and forth on the timeline, see diagnostics (prints, asserts, etc) inline in your editor, and eventually an integral waveform viewer. Currently it looks like this:

The reason I'm unsure if it will fit your workflow is that while it's easy to identify where each variable originates in Verilog, in Amaranth with interfaces it's actually quite difficult because the names of variables aren't directly visible in the source file that instantiates them, and so debugger services require an advanced understanding of code that will necessarily have to make specific assumptions about the metaprogramming.

Let me know if you're interested in testing it.