Closed KeenanRileyFaulkner closed 1 year ago
Successfully reproduced. The error only occurs in the riscv_final design during the physical netlist generation:
Running: riscv_fi (0:02:00) concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 148, in run_job
job.function()
File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 90, in run
self.run_rapidwright(phys_netlist_checkpoint, phys_netlist_edif_path)
File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 160, in run_rapidwright
self.process_all_luts(cells_already_visited)
File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 269, in process_all_luts
self.process_lut(lut6_cell, lut5_cell)
File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 645, in process_lut
self.lut_move_net_to_new_cell(
File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 786, in lut_move_net_to_new_cell
new_logical_pin = f"I{int(str(physical_pin[1])) - 1}"
ValueError: invalid literal for int() with base 10: 'L'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 263, in <module>
main(args.experiment_yaml, args.threads, args.print_period)
File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 83, in main
clean_jobs(jobs, future, statuses)
File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 166, in clean_jobs
finished_job_uuid = future.result()[0]
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
ValueError: invalid literal for int() with base 10: 'L'
This error is raised when physical_pin is 'CLK' and the above line new_logical_pin = f"I{int(str(physical_pin[1])) -1}"
is expecting "A0", "A1", "A2", etc.
I have a fix for this in my bram branch and I've added better error checking so unsupported LUTRAM primitives raise a TransformException with the LUTRAM name.
I will leave the weekly tests untouched then. Dr. Goeders said we could scale down to only use the 16 designs tested in the unit tests for the weekly ones, but if you've got a fix then there's no harm in leaving it and running again next week/after your merge
I have a draft pull request that fixes the above errors, but it hangs up on this:
Traceback (most recent call last):
File "/home/reilly/anaconda3/envs/eqv/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/reilly/equiv/bfasst/scripts/run_experiment.py", line 149, in run_job
job.function()
File "/home/reilly/equiv/bfasst/bfasst/transform/error_injector.py", line 165, in inject_wire_swap
driving_pin = self.__get_source(selected_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/reilly/equiv/bfasst/bfasst/transform/error_injector.py", line 203, in __get_source
for curr_pin in pin.wire.pins:
^^^^^^^^
AttributeError: 'java.lang.String' object has no attribute 'wire'
Not sure what design this is happening in. It would be nice if run_experiment reported exactly which designs succeeded, and if an unrecoverable excepetion happened (so not an AssertionError or BfasstException) it would print the design that caused the exception, print the exception, and then kill the rest of the jobs and running threads.
This happens any time a wire swap is injected into any design because the update that pushed rw functions to the injector returns strings but the injector expects sdn primitives. Logic to convert the strings to sdn primitives was removed in cc9e3d94281dc94f29e70ea4ada52906c0ba8431. There is a second issue that I assume stems from how direction of a port is handled. It triggers an illegal argument exception in rapidwright (but I admittedly am not certain port direction is the cause):
Exception while running job: alu
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "Unisim.java", line 41, in com.xilinx.rapidwright.design.Unisim.valueOf
Exception: Java Exception
The above exception was the direct cause of the following exception:
java.lang.java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: No enum constant com.xilinx.rapidwright.design.Unisim.SDN_VERILOG_ASSIGNMENT_1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 211, in _sendback_result
result_queue.put(_ResultItem(work_id, result=result,
File "/usr/lib/python3.10/multiprocessing/queues.py", line 371, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <java class 'java.lang.IllegalArgumentException'>: attribute lookup java.lang.IllegalArgumentException on java.lang failed
"""
The above exception was the direct cause of the following exception:
_pickle.PicklingError: Can't pickle <java class 'java.lang.IllegalArgumentException'>: attribute lookup java.lang.IllegalArgumentException on java.lang failed
Killed
Line 786 in bfasst is breaking the weekly error injection CI. However, the same unit tests pass. The issue is most likely to be with designs that are part of the error_injection weekly test but not part of the physical_netlist unit test, since the error_injection test more comprehensively tests the designs in the byu design directory. https://github.com/byuccl/bfasst/suites/14002997756/artifacts/781358036