David-Durst / aetherling

Create auto-scheduled data-parallel pipelines in hardware with user-friendly Python
MIT License
12 stars 1 forks source link

Map Is Slow #26

Closed David-Durst closed 4 years ago

David-Durst commented 4 years ago

The follow magma and fault takes more than 3 minutes to run: https://github.com/David-Durst/aetherling/blob/master/aetherling/examples/map_200_one_clock.py

This seems ridiculous given that it is only adding 5 to 200 numbers. I believe this is all in Verilator's compile time. I'll post a python profile log soon.

@leonardt thoughts?

--- edit That information is on my 2017 mbp. Verilator uses clang on that computer. A very similar (if not the same) file crashed when verilator used g++.

David-Durst commented 4 years ago
Screen Shot 2019-10-30 at 8 08 55 PM

All the time is taken by fault's process output to verilator. I don't have a profile of the verilator, but I've watched top enough times for similar files to feel confidant that all the time is spent on compile rather than the 4 simulation clock cycles.

David-Durst commented 4 years ago

This test used to fail on my Ubuntu 18.04.3 LTS VM which had 10 GB of RAM and a AMD 3700X processor. The subprocess of g++ known as cc1plus would run out of memory while multiple GB of RAM were free on my machine.

The changes I made on my linux desktop to call clang rather than g++ are to use the following flags rather than the commented out ones:

    tester.compile_and_run(target="verilator", magma_opts={
        "verilator_debug": True,
        "passes": ["rungenerators", "wireclocks-coreir", "verifyconnectivity --noclkrst"],
        "namespaces": ["aetherlinglib", "commonlib", "mantle", "coreir", "global"]
    #}, directory="vBuild/", flags=["-Wno-fatal",  "--no-decoration", "-O3"])
    }, directory="vBuild/", flags=["-Wno-fatal", "--compiler", "clang", "--no-decoration", "--output-split",  "20000", "--output-split-ctrace", "10000", "-O3"])

And I modified the verilator's make command in https://github.com/leonardt/fault/blob/master/fault/verilator_utils.py#L52 to add the following options

cmd += ['CXX=clang++']
cmd += ['OBJCACHE=ccache']
cmd += ["CXXFLAGS='-fPIC'"]

The last one only appears to be necessary due to the bug in verilator where it calls g++ no matter what for the last line in the makefile. Could we try in fault having a flag to alias g++ to clang++ only for the shell process that verilator runs in?

David-Durst commented 4 years ago

It looks like the verilator_debug flag caused the problem. Removing it made things go much faster.