Open vperrin59 opened 2 years ago
It sounds like a buffer that is not flushed properly. When a VUnit simulation ends properly the VHDL part of the tool writes test_suite_done
to the result file and then closes it https://github.com/VUnit/vunit/blob/f344c8b5642d7aa13db2e16f6fc7151585ca96d0/vunit/vhdl/core/src/core_pkg.vhd#L43-L49
One would assume that the implementation of file_close
also flushes any buffer but I don't think the VHDL standard is explicit about this. What happens if you add a new line line before line 48:
flush(test_results);
Does the error go away?I don't have access to Xcelium so I can't test it myself.
Could also be related to the operating system. Which one are you using?
I forgot to mention, I added $fclose to the vunit_pkg.sv (I'm using SystemVerilog), hoping it would magically fix the issue but it did not.
$fclose(trace_fd);
exit_without_errors = 1;
$finish(1);
I don't know of other ways to force flushing of the buffer in Verilog.
The operating system is CentOS
A way to get closer to finding out whether flushing is indeed the issue, you might try running xrun
with the -unbuffered
option (probably easiest by exporting XRUNOPTS
). This will be slower, so not recommended for production use.
I added $fflush(trace_fd) before the $fclose and it seems to have made it work. I'm running more iterations to confirm it's fixing the issue
A way to get closer to finding out whether flushing is indeed the issue, you might try running
xrun
with the-unbuffered
option (probably easiest by exportingXRUNOPTS
). This will be slower, so not recommended for production use.
Thanks for the tip, useful to know
@vperrin59 I'm not really a SystemVerilog user but I think the code we have look a bit dangerous. There is no $fclose so it looks like it relies on implicit $fclose rules in SystemVerilog. Are there such rules? Even if that is implemented it doesn't seem to work since your explicit $fclose doesn't fix the problem. I would expect $fclose to also do a flush but if your fix works that is not the case,
@LarsAsplund I agree with you. I would have thought fclose would fix the issue indeed, even though it looks like simulator automatically close the handler by finishing the sim. On my side, it seems only the $fflush fixes the issue
@vperrin59 I looked into the standard and I couldn't find any guidance. I think we should add both fflush and fclose. Want to make a PR?
So I've run more iterations during the night, and the issue still appear (less frequent though it seems). I have to debug it further.
Ok I made some progress, it looks like the issue coming from the fact that parameter runner_cfg get assigned to wrong value. It get assigned the value intended for another test. That is really strange because the content of xrun_simulate.args looks correct
Do you have a way to distinguish between randomly failing to write to file and randomly failing to complete the simulation for other reasons before VUnit writes to file. I've had experiences with other simulators failing to complete without any message whatsoever. You could try to write something to another file before completing. If flushing fails completely random you would not expect both writes to fail. At least not all the time.
I still didn't find the root cause of the issue. This happens randomly. Not always the same test fail. Something I noticed though:
For example if I have the following test start order:
test1, test2, test3, test4,
If the issue appears in test3, test3 will get runner_cfg param value from test2
I may look like a xcelium simulator issue when using gpg arg with multiple thread. I could not find any explanation. I found a work around by assigning runner_cfg with $value$plusargs function. Unfortunately I do not have another simulator to test. That would be interesting to know if someone with similar setup doesn't have any issue (xcelium, verilog, multiple threads)
@LarsAsplund I finally root caused the issue (that was tough debug). It looks like there is a fundamental issue with VUnit incisive/xcelium flow when running multiple threads and overriding parameters. I could fix it locally. The issue is related to cds.lib file. The parallel sims were pointing to the same work library and that caused the issue
What is your workaround? I'm working on another feature which will require that parallel simulations will have to start from different directories. Have no experience with Xcelium and the cds.lib so it might not make any difference for your issue. @cmarqu , any thoughts on this?
I'm actually suprised that this issue hasn't been reported before. Maybe not so much people use the same config as me. In order to fix the issue, that is the patch I used
diff --git a/vunit/sim_if/xcelium.py b/vunit/sim_if/xcelium.py
index 6196a5ee..c6ff6153 100644
--- a/vunit/sim_if/xcelium.py
+++ b/vunit/sim_if/xcelium.py
@@ -311,6 +311,12 @@ define work "{2}/libraries/work"
else:
steps = ["elaborate", "simulate"]
+ sim_cds_lib = os.path.join(script_path, "cds.lib")
+
+ cds = CDSFile.parse(self._cdslib)
+ # Change work path
+ cds["work"] = f"{script_path}/work"
+ cds.write(sim_cds_lib)
for step in steps:
cmd = str(Path(self._prefix) / "xrun")
args = []
@@ -342,7 +348,7 @@ define work "{2}/libraries/work"
# '-xmlibdirname "%s"' % (str(Path(self._output_path) / "libraries"))
# ] # @TODO: ugly
args += config.sim_options.get("xcelium.xrun_sim_flags", [])
- args += ['-cdslib "%s"' % self._cdslib]
+ args += ['-cdslib "%s"' % sim_cds_lib]
args += self._hdlvar_args()
args += ['-log "%s"' % str(Path(script_path) / ("xrun_%s.log" % step))]
if not self._log_level == "debug":
@vperrin59 What this is doing (unless I'm wrong) is compiling and elaborating the full snapshot with your design several times. While this works, it's more space and time efficient to do something like this in the elaborate
step:
xrun -cdslib cds.lib mymodule.vhdl -elaborate -xmlibdirname ./snapshot
and this in the simulate
step:
mkdir -p run1 && xrun -cdslib cds.lib -r mymodule -xmlibdirname ./snapshot -cds_implicit_tmpdir run1 -l run1.log ...
mkdir -p run2 && xrun -cdslib cds.lib -r mymodule -xmlibdirname ./snapshot -cds_implicit_tmpdir run2 -l run2.log ...
mkdir -p run3 && xrun -cdslib cds.lib -r mymodule -xmlibdirname ./snapshot -cds_implicit_tmpdir run3 -l run3.log ...
(Some of the options already exist in the vunit code.) There is a tutorial for this in the docs somewhere.
@cmarqu I don't think you can use only one elab command. If you have one tb with 2bit param and you have 4 tests that set 4 different values for the param then you need to call elaborate 4 times. Do you agree ?
Moreover with that being said, it would be more flexible to have the runner_cfg not defined as param but as variable string that would be assigned with $value$plusargs. That way we don't need to call elaborate for each test
Indeed, the example I was taking this from is using plusargs to differentiate between parallel runs.
But it might still work without plusargs - it is possible to stack elaborated snapshots (at least with what they call "Multi-snapshot incremental elaboration (MSIE)" and the related options), though I don't know offhand if this also works with -cds_implicit_tmpdir
.
I'm using Vunit with verilog/systemverilog and xcelium I have a strange issue. When in my setup I use multiple thread with the option:
python run.py --num-threads 5
Some tests are randomly failing. After some investigation, it looks like the issue is coming from vunit_results report. In the failing tests the file is empty and that's why they parsed as skipped by the script.
Did somebody face similar issue ?