Open jfrensch opened 1 year ago
This 3-step flow would also solve this issue https://github.com/VUnit/vunit/issues/877
Good news today is that Siemens decided to support VUnit with Questa licenses such that issues like this be solved.
I started to prototype on this and there is a first iteration to try out where I simply run vopt
before calling vsim
. I'm only using a few options in the vopt
call, most importantly is probably -floatgenerics
which is applied recursively on the simulation top-level. The runner_cfg
generic needs to be floating since it has no default value. Is there a need to control that more selectively to enable optimization for any custom generics added to the testbench?
In this iteration, optimization is not a proper step before the simulation step. This means that vopt
is called before starting the simulation of each test in a testbench. This is a problem as noted in #877.
# ** Note: (vsim-3812) Design is being optimized...
# ** Warning: (vopt-6) -- Waiting for lock by "larsa@LAPTOP-B6KUINVE, pid = 25780
# ". Lockfile is "C:/github/vunit/examples/vhdl/three_step_flow/vunit_out/modelsim/libraries/lib/_lock".
# ** Error: (vopt-2261) 'lib.opt_libtbexampletb' is already an optimized design.
# Optimization failed
This error is not suppressible so this doesn't help (I've added support for setting vopt
options, both in batch mode and in gui mode)
vu.set_sim_option("modelsim.vopt_flags", ["-suppress", "2261"])
If I run the tests after one another this doesn't seem to be a concern. vopt
is simply skipped if the design is already optimized:
# Incremental compilation check found no design-units have changed.
Making optimization a proper step executed before starting the simulation step is probably the solution to this.
Until then, please give it a try for your other use cases.
Btw, there is a small example that you can start playing with in https://github.com/VUnit/vunit/tree/three-step-flow/examples/vhdl/three_step_flow
I tried the three-step-flow branch and it will cause errors when running with multiple threads. The vopt call will need to be thread safe, you cannot run vopt in parallel with the same output destination apparently.
It can be solved either by:
The benefit of 2 is that it only runs vopt once for a top level and not for every simulation which could save time.
I have made a prototype of solution 1. described above and it works without problems in multi threading. This is really the simplest solution to get it working as separating the vopt step and using a Python-lock on it would require a lot of restructuring of modelsim.py
vs vsim_simulator_mixin.py
.
PS: Apparently it still has a probability to fail when using multi-threading. It seems even when using unique vopt artifacts per thread the common files in the library is also mutated by Questa.
Note also that I found problems with the floatgenerics
argument. It caused vopt to hang indefinitely on some of my test benches. By removing the dot from the end of -floatgenerics+top.
to -floatgenerics+top
it no longer hanged.
My understanding from the manual is that the trailing dot causes the generics of all lower instances to also be floating. For the purpose of VUnit it should be enough that the top level test bench generics are floating as changing the generic of a deeper instance is not required or supported. I would assume a floating top level generic coupled to a lower instance generic would also cause it to be floating anyway without the trailing dot.
It seems running vopt will mutate the common files in the library folder even if multiple threads use different vopt output targets. I have verified this by diffing all md5sum of all files in the library folder before and after running vopt. However just running vsim on an already created vopt folder does not seem to change any md5sum at all.
This makes me think a solution needs to ensure all vopt calls for a single library needs to happen before any simulation starts. Even between two test benches within the same library the vopt calls cannot run in parallel it seems.
PS: Another alternativ would be to just duplicate the library folders with one copy per thread. That would avoid any potential concurrency problem within the simulator itself.
@xkvkraoSICKAG Thanks for trying this out.
Yes, adding a thread suffix to the name of the optimized design would be a nice solution. It works in my simple example but I also see that the library files are modified.
As I see it, duplication is the only option. I will give it a try. I will also consult Siemens to get these observations verified.
The reason for using floating generics on all levels is that there are use cases where the runner_cfg
are passed to a lower-level entity. However, that is a less common use case that we can ignore in the first iterations.
@xkvkraoSICKAG Different directories for each thread is something that was already implemented in another feature branch so there is code to reuse from that branch (which was dropped when we realized that there were other ways to solve that feature).
Library duplication will have to be added and I've a discussion ongoing with Siemens to figure out to what extent that is needed. That will be an overhead that we obviously want to minimize so it will probably only be activated with the 3-step flow. The 2-step flow will work as before.
I have another observation to report. I tried changing the library folder format from vlib -type flat
which is the default to vlib -type directory
. This annoyingly almost became thread safe but unfortunately Questa still makes a tiny modification to the _info
file in the library folder when running vopt. It seems to greatly reduce the probability of a simulation error due to parallel file modifications in the library though.
Regarding library duplication. To reduce the overhead it could maybe use a smart approach based on https://docs.python.org/3/library/filecmp.html to only copy over what has changed.
Another approach would be to run vopt
as part of the compile step of VUnit which is already running in single threaded mode.
This would involve refactoring to also extract more information from the dependency scanner to be able to incrementally know which top level test benches need to be re-optimized.
I also tried with -type directory
and found that the libs are modified if I'm running tests on the same testbench that differ in the top-level generics. It looks like there are modifications to take into account beyond the small _info
change. I will start with the assumption that everything needs to be copied. That can always be improved if Siemens can provide some valuable insights into the details of the vopt
behavior.
I pushed my local changes I used to test to a fork: https://github.com/xkvkraoSICKAG/vunit/commits/three-step-flow2/
I think this commit may be of interest, it ensures the library mapping arguments are deterministic between calls to vcom/vlog and vopt. Before this change they were subject to the random iteration order of dictionary keys: https://github.com/xkvkraoSICKAG/vunit/commit/be872712a0910f5d4a68822a4aa93e9b4eec910c
@xkvkraoSICKAG Can you make a pull request to the VUnit repo?
@LarsAsplund Yes if you can rebase the three-step-flow
branch on VUnit master I can make a PR to three-step-flow
.
I started to build on a solution where the first test running a testbench performs the optmization which is then used by the other tests. The optimization is still part of the simulation step so the second test has to wait for both the optimization and simulation of the first test to complete before it can proceed. That will be fixed later but I did see something that needs more investigation
Below is a debug log from a testbench with an single test that has 5 different configurations. I'm using two threads for my test run.
The first test run gets to optimize the testbench:
2024-08-31 19:20:04,426 - DEBUG - (lib.tb_example.0.test) Optimizing lib.tb_example(tb)
Since the second test starts simultaneously in another thread, it blocks while waiting for the first test:
2024-08-31 19:20:04,426 - DEBUG - (lib.tb_example.1.test) Waiting for lib.tb_example(tb) to be optimized.
2024-08-31 19:20:04,429 - DEBUG - Starting lib.tb_example.0.test simulation
2024-08-31 19:20:07,808 - DEBUG - lib.tb_example.0.test simulation completed
2024-08-31 19:20:07,808 - DEBUG - lib.tb_example(tb) optimization completed
Now the second test case can proceed:
2024-08-31 19:20:07,811 - DEBUG - Starting lib.tb_example.1.test simulation
With the first test completed, there is one simulation thread available and the third test can start. At this point there is no need to wait for the optimized testbench:
2024-08-31 19:20:07,826 - DEBUG - (lib.tb_example.2.test) Reusing optimized lib.tb_example(tb)
2024-08-31 19:20:07,829 - DEBUG - Starting lib.tb_example.2.test simulation
For every test completed, a new one can start:
2024-08-31 19:20:10,670 - DEBUG - lib.tb_example.1.test simulation completed
2024-08-31 19:20:10,687 - DEBUG - (lib.tb_example.3.test) Reusing optimized lib.tb_example(tb)
2024-08-31 19:20:10,691 - DEBUG - Starting lib.tb_example.3.test simulation
2024-08-31 19:20:13,107 - DEBUG - lib.tb_example.3.test simulation completed
2024-08-31 19:20:13,127 - DEBUG - (lib.tb_example.4.test) Reusing optimized lib.tb_example(tb)
2024-08-31 19:20:13,130 - DEBUG - Starting lib.tb_example.4.test simulation
2024-08-31 19:20:15,640 - DEBUG - lib.tb_example.4.test simulation completed
But what happened to the third test case? It takes forever to complete and is overtaken by the tests starting after it.
2024-08-31 19:20:27,080 - DEBUG - lib.tb_example.2.test simulation completed
Regardless how many configurations I create of the test, there is always one which completes much later than the others. That is something I have to investigate further.
It should be said that a single-thread test run works as expected. The first test takes a bit longer to run since it's doing the optimization:
Some test runs with two threads work well. In this case, the second test also takes some extra time since it's waiting for the first to complete. After that everything runs smoothly:
This is what a bad run with two threads looks like:
What I see is that it is the simulation process that takes time and the problem is intermittent. This is the execution time for 500 configurations of the same test:
Considering that I once got this message, I'm suspecting this has to do with the license server. In my case it sits on my computer so there is no network delay.
After trying for 30 seconds it simply fails. I will cleanup my code so that you can test on your computers.
I started to build on a solution where the first test running a testbench performs the optmization which is then used by the other tests. The optimization is still part of the simulation step so the second test has to wait for both the optimization and simulation of the first test to complete before it can proceed.
Based on my investigations running vopt on any design in a top level will mutate the library_folder/_info file. So running vopt in a library has to lock the entire library. Thus care has to be taken if there are several test benches in the same library, they cannot have vopt run in parallel.
Agree, there will be multiple conditions for when vsim and vopt can be run. vsim waits for the testbench to be optimized if it hasn't already and vopt waits for the lib to be available. I hope that a second vopt on a library doesn't invalidate previous vopts on that library just because of the altered _info file.
Agree, there will be multiple conditions for when vsim and vopt can be run. vsim waits for the testbench to be optimized if it hasn't already and vopt waits for the lib to be available. I hope that a second vopt on a library doesn't invalidate previous vopts on that library just because of the altered _info file.
Unfortunately I think the altered _info file does cause problems. I am running our company internal simulations on my three-step-flow branch. On this branch only the _info file is mutated during vopt and still it causes test cases to fail with a low probability. To mitigate this every test bench in the same library must be sequentially vopt:ed which kind of defeats the common vunit style of having multiple test benches in the same library.
I was thinking about the case where you vopt every testbench sequentially. If you vopt A and then B, will you then have to vopt A again before running just because _info changed? Even if the designs didn't change?
The vopt lock would be a problem if you have testbenches without test cases. They would run in series. If you have test cases, only the first test's vopt will run on it's own. All the vsims will run in parallel. If scheduling is optimised, the vopt for the next testbench could run in parallel with the vsims of the previous.
I was thinking about the case where you vopt every testbench sequentially.
Yes the problem is the _info mutation forces you to run vopt sequentially for all testbenches within a library before starting any simulation. This becomes a problem if you have a lot of test benches within the same library.
@xkvkraoSICKAG Ok, so what I have now is a prototype that manages locks for the libraries as well. There are corners which I have yet to handle but it should be useful for testing in some different projects. I've tested it for myself with dummy testbenches and two threads and also with a client that has a single license. What I found was that using two threads improved performance even if there was only one license. Not sure why but maybe vopt is allowed to run concurrently with vsim on a single licens.
Running vopt on one testbench in one thread while running vsim on a testbench already optimized in another thread hasn't caused any problems for me. This is when I run with two licenses. Running two vopts at the same time on different libraries also works.
If you have a library with 10 testbenches and simulation time is much longer than optimization time you will eventually have 10 simulations running concurrently, provided you have that many licenses. The problem is if simulation time is relatively small compared to the optimization time. But is optimization needed in those cases?
I will push what I have tomorrow so you can try it.
@xkvkraoSICKAG I pushed an update now. Can you test it with your real-life project? I still see that some tests take a long time so it would be interesting to see what you experience. I've kept the recursive option on floatgenerics
so feel free to test without if you run into problems. You can also run with the --log-level=debug
option to get some info on how the threads synchronize with respect to each other.
@LarsAsplund I did some tests using commit https://github.com/VUnit/vunit/commit/1ce6bbadd41728b7b24c27ef2d67d7fe099ef1c6.
I still end up with problems when running 5 test cases with 5 threads. 4 tests belong to the same library and 1 test belongs to another.
Sometimes it does run without issues but it is really random... I can have multiple runs without issues and then have 2 consecutive runs with issues.
@tasgomes Can you start by running the example here: https://github.com/VUnit/vunit/tree/three-step-flow/examples/vhdl/three_step_flow?
How did you enable optimization, see https://github.com/VUnit/vunit/blob/1ce6bbadd41728b7b24c27ef2d67d7fe099ef1c6/examples/vhdl/three_step_flow/run.py#L31
Looking at the error part:
# vsim -modelsimini C:/Git/et-fw/Demos/ExampleProject/lib/FW/VHDL/vunit/examples/vhdl/three_step_flow/vunit_out/modelsim/modelsim.ini -wlf C:/Git/et-fw/Demos/ExampleProject/lib/FW/VHDL/vunit/examples/vhdl/three_step_flow/vunit_out/test_output/lib2.tb_example.054861d12681315f7dce6ece3407eb12387d16940/modelsim/vsim.wlf -work lib2 -quiet -t ps -onfinish stop opt_lib2tbexampletb -L vunit_lib -L lib1 -L lib2 -g/tb_example/runner_cfg="active python runner : true,enabled_test_cases : test,output path : C::/Git/et-fw/Demos/ExampleProject/lib/FW/VHDL/vunit/examples/vhdl/three_step_flow/vunit_out/test_output/lib2.tb_example.054861d12681315f7dce6ece3407eb12387d16940/,tb path : C::/Git/et-fw/Demos/ExampleProject/lib/FW/VHDL/vunit/examples/vhdl/three_step_flow/,use_color : true" -g/tb_example/value=0
# Start time: 12:50:27 on Sep 06,2024
# ** Note: (vsim-3812) Design is being optimized...
# ** Warning: (vopt-6) -- Waiting for lock by "tgomes@BELUGA, pid = 10148
# ". Lockfile is "C:/Git/et-fw/Demos/ExampleProject/lib/FW/VHDL/vunit/examples/vhdl/three_step_flow/vunit_out/modelsim/libraries/lib2/_lock".
# ** Error: (vopt-2261) 'lib2.opt_lib2tbexampletb' is already an optimized design.
# Optimization failed
VUnit calls vsim
with an optimized design (opt_lib2tbexampletb
) but Questa still thinks it needs to be optimized (Design is being optimized...
) but eventually it figures out that it is already optimized ('lib2.opt_lib2tbexampletb' is already an optimized design.
) and fails.
The fact that Questa is waiting for a lock file suggests that there is a race between VUnit's internal thread synchronization and the lock state of Questa. The VUnit threads are synchronized with OS mechanisms. When one vopt
call returns, VUnit will let another thread to call it regardless of any lock file. I will update to take the lock file into account and see if that helps.
Pushed a new version where i check and wait for the lock file to be removed. Since I can't recreate your problem, I haven't been able to test it properly.
I only have two licenses. Does it work with -p2
?
@LarsAsplund I still get some errors sometimes, see two examples below with -p 2
:
@tasgomes One error is the same as the one we saw before but the other is new. @xkvkraoSICKAG and I concluded that vopt
on different libraries can be done in parallel since only the _info
file in the library to which the optimized design belongs is modified. The second error suggests that this is not the case. I pushed a quickfix that prevents two vopts to run at the same time. Let's see if that helps.
Hi @LarsAsplund, it seems stable with 2 threads but if I increase the number of threads then issues appear again. See below an example with 6 threads:
As referenced before, it seems like Questa tries to optimize a design that is already optimized, and Questa throws an error in this case:
# Start time: 08:14:52 on Sep 11,2024
# ** Note: (vsim-3812) Design is being optimized...
# ** Warning: (vopt-6) -- Waiting for lock by "tgomes@BELUGA, pid = 20312
# ". Lockfile is "C:/Git/et-fw/Demos/ExampleProject/lib/FW/VHDL/vunit/examples/vhdl/three_step_flow/vunit_out/modelsim/libraries/lib2/_lock".
# ** Error: (vopt-2261) 'lib2.opt_lib2tbexampletb' is already an optimized design.
# Optimization failed
# ** Note: (vsim-12126) Error and warning message counts have been restored: Errors=1, Warnings=1.
# Error loading design
Error loading design
# End time: 08:14:55 on Sep 11,2024, Elapsed time: 0:00:03
# Errors: 1, Warnings: 1
Perhaps we could query and check first if the design is already optimized or not and then skip optimization in that case. There is a vopt
argument that apparently allows to skip optimization:
I checked Questa documentation and also found the following that could shed some light on how to "lock" optimizations for "multi-thread" cases:
@tasgomes When we run vopt
, we create a name for the optimized design that differs from the original design and call vsim
with that. The optimized design is named as the non-optimized design but with all non-alphanumeric characters removed and an opt_
prefix. The log tells us that vsim
is being called with opt_lib2tbexampletb
so there is no need for an extra query. However, to be on the safe side I suggest that you add this to your run script
vu.set_sim_option("modelsim.vsim_flags", ["-novopt", "-suppress", "12110"])
The -novopt
option will result in an error in the normal case when vsim
correctly recognizes that its design input is already optimized but that error (12110) can be suppressed,
I'm suspecting that the confusion about the state of the design has to do with the lock file vsim
is complaining about. So far I only prevented several vopt
calls at the same time but not that simulations are run during vopt
. It could be that vsim
can't do that. In the best scenario, vsim
in one lib can still run concurrently with vopt
on another lib. To test that theory you could try:
python run.py -p2 --log-level=debug lib2.tb_example.* lib1.*
Now we only have two testbenches located in separate libs. Once the first testbench has been optimized in one of the libs, the second testbench will be optimized to the other lib concurrently with the simulations of the testbench in the first lib.
If vopt
can't be executed with any vsim
, it will be much more limiting. The simplest solution to that would cause a flow where one testbench is optimized first. Then all its test cases will be simulated in parallel before the pattern repeats with the second testbench. If the first testbench has 2 test cases, we will only run 2 parallel threads even if we have licenses and CPU cores that can support more threads.
To get around that we would have to optimize all testbenches first in series and then do all simulations in parallel. That would affect the VUnit design at a higher level as we cannot hide it locally in the Questa/Modelsim interface code anymore.
@LarsAsplund I added:
vu.set_sim_option("modelsim.vsim_flags", ["-novopt", "-suppress", "12110"])
Sometimes it works and sometimes fails as shown below:
Then I try:
python run.py -p2 --log-level=debug lib2.tb_example.* lib1.*
I get similar results, either:
# ** Error: (vsim-3170) Could not find 'opt_lib2tbexampletb'.
when using -novopt
# ** Error: (vopt-2261) 'lib2.opt_lib2tbexampletb' is already an optimized design.
without the -novopt
flag@tasgomes I suspect that the new "could not find" error message can be just another expression of Questa being confused rather than the file actually being missing. However, to be on the safe side, I will make sure that the output from vopt
really exists in the file system before proceeding.
The next step is to also disallow vsim
runs concurrent to vopt
such that vopt
always runs in complete isolation.
@tasgomes Before doing anything else, can you do pip install watchdog
and then run with the latest update? It will log changes in the file system. Maybe that can help us understand what is going on.
@LarsAsplund
A good run for your reference: good_5.log
A bad run with -novopt
:
error_5a.log
A bad run without -novopt
:
error_5b.log
@tasgomes Ok, now I see it. There is a bug in my code which causes a race condition. Please try the latest push.
I also started to remove the things I added lately before finding what I think was the real bug. I suggest testing the last three commits one at a time to see if removing everything was too optimistic
@tasgomes @xkvkraoSICKAG Have any of you had the chance to test the latest commit with your projects?
@LarsAsplund I am out of office this week. I can retry this again next Monday.
@LarsAsplund I am back. Below you find the last three commits one at a time. I ran each several times without problems, except for the last one. This one has an issue that occurs only sometimes.
@tasgomes Thanks @tasgomes. I suspect there can be a slight delay before files owned by one vopt
call (lock files or other files) are properly released on the file system. A second vopt
call, made after the first one returns, may run into that.
However, in this case vopt
fails on the first call on lib2
. The only previous call was on lib1
. There should not be any prior activity on lib2
files unless:
vopt
on one library doesn't affect other libraries was wrong. In this case the examples are standalone but what would happen if a testbench in lib1
uses a component from lib2
.I'll reach out to Siemens for some more support. As long as we are guessing how it works, we cannot be sure we have a stable solution
@LarsAsplund I restarted my laptop to make sure everything is clean. Then I executed the test twice. The first time was successful, but the second time fails:
Could it also be that the previous run did not close or delete the lock files properly?
Is there any PR for those changes to look at? Especially with the fact that questasim introduced qrun
command that wraps all the other commands in one step instead of 3-step build https://www.linkedin.com/pulse/improve-your-compilation-flow-questasim-mikael-andersson/
And second question: What about questa visaliser offline debug support? Did anyone try running that with VUnit?
@SzymonHitachi You can find the work in the https://github.com/VUnit/vunit/tree/three-step-flow branch. There is a simple testbench https://github.com/VUnit/vunit/tree/three-step-flow/examples/vhdl/three_step_flow which we use to get a first proof of concept. It works for me but not for @tasgomes and I'm assuming we have some race conditions.
I'm aware of qrun
but initially I want to follow a flow that is more generic and applies to all simulators we support. That makes our code cleaner. It should be possible to do what is done in qrun
though. Apart from making Questa run in a multi-threaded setup, we are also planning to add compile groups which allow for compiling files in groups. I know that Mikael prefers that approach to compilation but there are more use cases we need to support so we need to keep our design flexible. I got an email from Mikael today (no coincidence I assume :smile:). I'm hoping he will join the discussion here as he knows Questa under the hood.
What I think we need from Siemens at this point is:
vopt
runs be considered independent such that they can run concurrently.vopt
has "released" all files.@SzymonHitachi You can find the work in the https://github.com/VUnit/vunit/tree/three-step-flow branch. There is a simple testbench https://github.com/VUnit/vunit/tree/three-step-flow/examples/vhdl/three_step_flow which we use to get a first proof of concept. It works for me but not for @tasgomes and I'm assuming we have some race conditions.
I'm aware of
qrun
but initially I want to follow a flow that is more generic and applies to all simulators we support. That makes our code cleaner. It should be possible to do what is done inqrun
though. Apart from making Questa run in a multi-threaded setup, we are also planning to add compile groups which allow for compiling files in groups. I know that Mikael prefers that approach to compilation but there are more use cases we need to support so we need to keep our design flexible. I got an email from Mikael today (no coincidence I assume 😄). I'm hoping he will join the discussion here as he knows Questa under the hood.
Thanks for the links. It seems it still needs modelsim defined as the simulator, so I guess it requires to have either modelsim/questasim installed or some ENV var defined to use one or another?
Hi all,
my understanding is, that the VUnit framework (always?) uses the "2-step flow" (-> vcom & vsim) of Questa, where the 'vopt' step is automatically applied during 'vsim'. Unfortunaly, I need some intermediate result of the vopt step to be able to use the 'Visuallizer' to analyze the results of a simulation:
not only provides an optimized design 'tb_opt' of the original testbench 'tb' for faster simulation, but also a database (-> design.bin) required by the 'Visualizer' to correlate the simulation results from 'tb_opt' to the design being simulated.
How can/should I apply that 3rd step (vopt) within the VUnit framework?
Many thanks for advice Jochen