byuccl / bfasst

Tools for FPGA Assurance Flows
Apache License 2.0
12 stars 5 forks source link

Issue with physical netlist generation for byu/riscv_final #238

Closed KeenanRileyFaulkner closed 1 year ago

KeenanRileyFaulkner commented 1 year ago

Line 786 in bfasst is breaking the weekly error injection CI. However, the same unit tests pass. The issue is most likely to be with designs that are part of the error_injection weekly test but not part of the physical_netlist unit test, since the error_injection test more comprehensively tests the designs in the byu design directory. https://github.com/byuccl/bfasst/suites/14002997756/artifacts/781358036

KeenanRileyFaulkner commented 1 year ago

Successfully reproduced. The error only occurs in the riscv_final design during the physical netlist generation:

Running: riscv_fi (0:02:00) concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 148, in run_job
    job.function()
  File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 90, in run
    self.run_rapidwright(phys_netlist_checkpoint, phys_netlist_edif_path)
  File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 160, in run_rapidwright
    self.process_all_luts(cells_already_visited)
  File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 269, in process_all_luts
    self.process_lut(lut6_cell, lut5_cell)
  File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 645, in process_lut
    self.lut_move_net_to_new_cell(
  File "/home/keenanrf/research/test/bfasst/bfasst/transform/xilinx_phys_netlist.py", line 786, in lut_move_net_to_new_cell
    new_logical_pin = f"I{int(str(physical_pin[1])) - 1}"
ValueError: invalid literal for int() with base 10: 'L'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 263, in <module>
    main(args.experiment_yaml, args.threads, args.print_period)
  File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 83, in main
    clean_jobs(jobs, future, statuses)
  File "/home/keenanrf/research/test/bfasst/scripts/run_experiment.py", line 166, in clean_jobs
    finished_job_uuid = future.result()[0]
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
ValueError: invalid literal for int() with base 10: 'L'

This error is raised when physical_pin is 'CLK' and the above line new_logical_pin = f"I{int(str(physical_pin[1])) -1}" is expecting "A0", "A1", "A2", etc.

reillymck commented 1 year ago

I have a fix for this in my bram branch and I've added better error checking so unsupported LUTRAM primitives raise a TransformException with the LUTRAM name.

KeenanRileyFaulkner commented 1 year ago

I will leave the weekly tests untouched then. Dr. Goeders said we could scale down to only use the 16 designs tested in the unit tests for the weekly ones, but if you've got a fix then there's no harm in leaving it and running again next week/after your merge

reillymck commented 1 year ago

I have a draft pull request that fixes the above errors, but it hangs up on this:

Traceback (most recent call last):
  File "/home/reilly/anaconda3/envs/eqv/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/reilly/equiv/bfasst/scripts/run_experiment.py", line 149, in run_job
    job.function()
  File "/home/reilly/equiv/bfasst/bfasst/transform/error_injector.py", line 165, in inject_wire_swap
    driving_pin = self.__get_source(selected_input)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/reilly/equiv/bfasst/bfasst/transform/error_injector.py", line 203, in __get_source
    for curr_pin in pin.wire.pins:
                    ^^^^^^^^
AttributeError: 'java.lang.String' object has no attribute 'wire'

Not sure what design this is happening in. It would be nice if run_experiment reported exactly which designs succeeded, and if an unrecoverable excepetion happened (so not an AssertionError or BfasstException) it would print the design that caused the exception, print the exception, and then kill the rest of the jobs and running threads.

KeenanRileyFaulkner commented 1 year ago

This happens any time a wire swap is injected into any design because the update that pushed rw functions to the injector returns strings but the injector expects sdn primitives. Logic to convert the strings to sdn primitives was removed in cc9e3d94281dc94f29e70ea4ada52906c0ba8431. There is a second issue that I assume stems from how direction of a port is handled. It triggers an illegal argument exception in rapidwright (but I admittedly am not certain port direction is the cause):

Exception while running job: alu
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "Unisim.java", line 41, in com.xilinx.rapidwright.design.Unisim.valueOf
Exception: Java Exception

The above exception was the direct cause of the following exception:

java.lang.java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: No enum constant com.xilinx.rapidwright.design.Unisim.SDN_VERILOG_ASSIGNMENT_1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 211, in _sendback_result
    result_queue.put(_ResultItem(work_id, result=result,
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 371, in put
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <java class 'java.lang.IllegalArgumentException'>: attribute lookup java.lang.IllegalArgumentException on java.lang failed
"""

The above exception was the direct cause of the following exception:

_pickle.PicklingError: Can't pickle <java class 'java.lang.IllegalArgumentException'>: attribute lookup java.lang.IllegalArgumentException on java.lang failed
Killed