clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.44k stars 154 forks source link

Pathalogical Compile Times and Memory Usage For Simple Example #236

Closed Jhana1 closed 3 years ago

Jhana1 commented 7 years ago

I recently ran into this issue, which I think is a compiler bug. Take this simple example.

import CLaSH.Prelude

topEntity :: Signal (Unsigned 16) -> Signal (Vec 25 (Unsigned 16))
topEntity x = bundle . map ($ x) $ repeat id

The result of calling :verilog on this is

[1 of 1] Compiling Main             ( src/Pathalogical.hs, interpreted )
Ok, modules loaded: Main.
*Main> :verilog
Loading dependencies took 0.442108s
Applied 3337 transformations
Normalisation took 28.946904s
Netlist generation took 0.005913s
Testbench generation took 0.000369s
Total compilation took 29.396714s

If I increase this from Vec 25 to Vec 30 the time blows out even more

[1 of 1] Compiling Main             ( src/Pathalogical.hs, interpreted )
Ok, modules loaded: Main.
*Main> :verilog
Loading dependencies took 0.355349s
Applied 5527 transformations
Normalisation took 63.627535s
Netlist generation took 0.007838s
Testbench generation took 0.000369s
Total compilation took 63.99257s

Supposing this isn't a bug, do you have any advice on how to formulate such code to avoid the non-linear compiler impact? (This is running Clash 0.7.1, I've yet to try it with the 8.2 branch.)

christiaanb commented 7 years ago

Yeah, the clash compiler is really slow at symbolic evaluation. So in this case you are forcing clash to completely unroll the definition of map because you have a Vec of functions.

I haven't really given it much thought on how to do symbolic evaluation quickly, so I cannot give a time estimate as to when I can fix this.

As a work-around I discourage using Vectors of functions.

Jhana1 commented 7 years ago

Thanks for the quick response. Any idea on what might be a decent replacement for a vec if I wanted to have say 500+ of these things?

christiaanb commented 7 years ago

I could better answer your question if I know what your use-case is. Why do you have 500+ different functions in a vector? and want to apply them with some argument in this way?

Jhana1 commented 7 years ago

I need to do rapid classification. Each function acts as a distinct classifier. I need to pass the same information into each function, and then filter the output based on some heuristic. Because my workload is not suitable for pipe-lining, parallel execution is the only option to meet my latency requirements.

christiaanb commented 7 years ago

And how are you creating this vector of 500 functions? are you really doing:

vecFun = multiply :> add :> subtract :> ... :> divide :> Nil

Or do you have single architecture, that you are simply configuring differently?

vecFun = arch config1 :> arch config2 :> ... :> arch config500 :> Nil

?

Jhana1 commented 7 years ago

Single architecture, configured differently.

christiaanb commented 7 years ago

The you could simply do:

topEntity x = map (\config -> arch config x) (config1 :> config2 :> ... :> config500 :> Nil)

Which will be much faster to compile.

Jhana1 commented 7 years ago

I think that solves my problem. Thanks again Christiaan.

alex-mckenna commented 3 years ago

Closing: on master this currently gives the following results:

$ cabal run -- clash -fclash-no-cache -fclash-clear --verilog T236.hs
Up to date
Loaded package environment from /home/axm/Documents/clash-compiler/.ghc.environment.x86_64-linux-8.10.4
GHC: Parsing and optimising modules took: 0.582s
GHC: Loading external modules from interface files took: 0.000s
GHC: Parsing annotations took: 0.000s
Clash: Parsing and compiling primitives took 0.120s
GHC+Clash: Loading modules cumulatively took 0.778s
Clash: Compiling T236.topEntity
Clash: Ignoring previously made caches
Clash: Normalization took 0.034s
Clash: Netlist generation took 0.000s
Clash: Total compilation took 0.819s

Increasing to a Vec with 250 elements is still quicker than Clash was when the issue was posted:

$ cabal run -- clash -fclash-no-cache -fclash-clear --verilog T236.hs
Up to date
Loaded package environment from /home/axm/Documents/clash-compiler/.ghc.environment.x86_64-linux-8.10.4
GHC: Parsing and optimising modules took: 0.585s
GHC: Loading external modules from interface files took: 0.000s
GHC: Parsing annotations took: 0.000s
Clash: Parsing and compiling primitives took 0.122s
GHC+Clash: Loading modules cumulatively took 0.783s
Clash: Compiling T236.topEntity
Clash: Ignoring previously made caches
Clash: Normalization took 25.199s
Clash: Netlist generation took 0.005s
Clash: Total compilation took 25.999s