Update vivado_structural_error_injection to only look for files it created this invocation

jgoeders commented 7 months ago

When you run a flow with structural comparison (vivado_phys_netlist_cmp), followed by the vivado_structural_error_injection flow, an exception occurs because the former flow leaves a file in the struct_cmp directory called struct_cmp.log. This breaks the error injection code as it is not expecting this file to be here.

I've added a work-around here, but I wonder the larger implications of this? What is the correct way to make sure we don't have these kinds of issues?

KeenanRileyFaulkner commented 7 months ago

I tried the following runs which resulted in a different error than I expected:

python scripts/run.py vivado_phys_netlist_cmp byu/alu
>>> [11/11] structurally compare /home/keenanrf/research/bfasst/build/byu/alu/xray/alu_reversed.v /home/keenanrf/research/bfasst/build/byu/alu/vivado_phys_netlist/viv_impl_physical.v

python scripts/run.py vivado_structural_error_injection byu/alu
>>> [36/400] structurally compare /home/keenanrf/research/bfasst/build/byu/alu/vivado_phys_netlist/viv_impl_physical.v /home/keenanrf/research/bfasst/build/byu/alu/error_injection/bit_flip_12.v
FAILED: /home/keenanrf/research/bfasst/build/byu/alu/struct_cmp/bit_flip_12_cmp.log
python bfasst/ninja_utils/structural.py --build_dir /home/keenanrf/research/bfasst/build/byu/alu --netlists /home/keenanrf/research/bfasst/build/byu/alu/vivado_phys_netlist/viv_impl_physical.v /home/keenanrf/research/bfasst/build/byu/alu/error_injection/bit_flip_12.v --log_path /home/keenanrf/research/bfasst/build/byu/alu/struct_cmp/bit_flip_12_cmp.log
Traceback (most recent call last):
  File "/home/keenanrf/research/bfasst/bfasst/ninja_utils/structural.py", line 770, in <module>
    struct_cmp.compare_netlists()
  File "/home/keenanrf/research/bfasst/bfasst/ninja_utils/structural.py", line 171, in compare_netlists
    self.perform_mapping()
  File "/home/keenanrf/research/bfasst/bfasst/ninja_utils/structural.py", line 411, in perform_mapping
    self.potential_mapping_wrapper(next(instance_iter))
  File "/home/keenanrf/research/bfasst/bfasst/ninja_utils/structural.py", line 357, in potential_mapping_wrapper
    self.add_block_mapping(instance, matched_instance)
  File "/home/keenanrf/research/bfasst/bfasst/ninja_utils/structural.py", line 535, in add_block_mapping
    self.add_net_mapping(net_a, net_b)
  File "/home/keenanrf/research/bfasst/bfasst/ninja_utils/structural.py", line 548, in add_net_mapping
    raise AssertionError(
AssertionError: CLBLM_R_X35Y120_SLICE_X52Y120_DO6 in net_mapping.inverse already. net1: \result_OBUF[11]_inst_i_5_n_0
[45/400] inject a BIT_FLIP into xrev netlist for /home/keenanrf/research/bfasst/build/byu/alu
ninja: build stopped: subcommand failed.
Error injection failed on bit_flip_12
ERROR: Ninja failed with return code 1

@jgoeders is there a specific design you ran that triggered your exception? This exception seems unrelated to the presence of struct_cmp.log

jgoeders commented 7 months ago

@KeenanRileyFaulkner I didn't run into the issue you linked above, but it's possible I accidentally changed how you were doing your random seeding. I'll double check.

I think to reproduce the error I ran this script: https://github.com/byuccl/bfasst/blob/main/quick_test_flows.sh, after disabling the fix I did that I linked above.

KeenanRileyFaulkner commented 7 months ago

Ok. We've got big problems with the compare tool, that are going to make this issue harder to fix until the other one is solved. See #367

jgoeders commented 7 months ago

Yes, I saw that one. I'll take a stab at it tonight.

jgoeders commented 7 months ago

@KeenanRileyFaulkner @dallinjdahl @reillymck

I'm guessing others had already realized this, but I just realized that certain flow_args we were using were not really "optional", but rather required in order to run the flow. For example, any flows using physical netlist generation require that the synthesis be done with flatten.

This was pretty clunky since it meant that if you wanted to run a flow from the command line you had to put those flow args in order for the flow to succeed. Since these options are baked into the flow, it doesn't really make sense to require a user to enter certain arguments. I've updated the flows to add the options themselves by default (https://github.com/byuccl/bfasst/blob/main/bfasst/ninja_flows/vivado_phys_netlist.py#L36)

However, this causes a new issue. Anytime a flow wants to configure a tool a certain way, it won't be captured by the ninja rebuilder when moving from running one flow to the next. Is that correct?

For example, suppose I run the vivado flow (which won't perform flatten), and then I run one of the physical netlist flows. The unflattened synthesis output will likely be re-used, which would cause an error. Thoughts?

KeenanRileyFaulkner commented 7 months ago

I'm not sure if that is correct or not. For example, I just ran the vivado flow without the synth_options set, followed by the vivado_structural_error_injection flow with the synth_options set, on the main branch, using a yaml file. Both flows ran successfully to completion, which seems to indicate that synthesis re-ran so that the phys_netlist tool could succeed. Things may be diffferent when using the command line instead of the yaml file.

If that is indeed the case, then I have two alternatives I've come up with so far.

The first would be that the synth_options be saved into a separate file from the rest of the synth.json (used to template the synth tcl file), so that the additional file can be depended on by the vivado rule when the phys_netlist rules require it to and not depended on outside of that context. This seems like the better of the two ideas to me.
The second is that we store the name of the flow that is being run in a file. On the next run, we compare the flow being run to the previous flow name, and include logic for some flag to be set in the new flow that causes the synthesis to be rerun with the correct options set. This seems to have several drawbacks, including having to manage details of flow creation at the run.py or ninja_flow_manager level that should be handled at the individual flow level.

Either one of these approaches would be somewhat brittle to the idea of introducing multiple types of flows in a single run, if that's a direction we take the project in the future. However, I don't know if any of this is necessary because it seems to work as is? I'll have to do some more testing.

dallinjdahl commented 7 months ago

So the synth_args passed to vivado are stored in a synth.json file, which is used by chevron within ninja to populate synth.tcl (see https://github.com/byuccl/bfasst/blob/1a9be2155d6ad6f56204fec5e114c45d88ce5f80/bfasst/ninja_tools/synth/vivado_synth_build.ninja.mustache#L1C117-L1C117). This means that if the synth_args change, that change must be propagated to the synth.json file if it is going to take effect at all, and therefore will trigger a re-run of vivado. So I think it should all just work.

On Wed, Nov 29, 2023 at 11:34 AM Keenan Faulkner @.***> wrote:

I'm not sure if that is correct or not. For example, I just ran the vivado flow without the synth_options set, followed by the vivado_structural_error_injection flow with the synth_options set, on the main branch, using a yaml file. Both flows ran successfully to completion, which seems to indicate that synthesis re-ran so that the phys_netlist tool could succeed. Things may be diffferent when using the command line instead of the yaml file.

If that is indeed the case, then I have two alternatives I've come up with so far.

The first would be that the synth_options be saved into a separate file from the rest of the synth.json (used to template the synth tcl file), so that the additional file can be depended on by the vivado rule when the phys_netlist rules require it to and not depended on outside of that context. This seems like the better of the two ideas to me.

The second is that we store the name of the flow that is being run in a file. On the next run, we compare the flow being run to the previous flow name, and include logic for some flag to be set in the new flow that causes the synthesis to be rerun with the correct options set. This seems to have several drawbacks, including having to manage details of flow creation at the run.py or ninja_flow_manager level that should be handled at the individual flow level.

Either one of these approaches would be somewhat brittle to the idea of introducing multiple types of flows in a single run, if that's a direction we take the project in the future. However, I don't know if any of this is necessary because it seems to work as is? I'll have to do some more testing.

— Reply to this email directly, view it on GitHub https://github.com/byuccl/bfasst/issues/361#issuecomment-1832486100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKGMS3V53RQJB3RFYWFPF4DYG553ZAVCNFSM6AAAAAA72L6CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSGQ4DMMJQGA . You are receiving this because you were mentioned.Message ID: @.***>

jgoeders commented 7 months ago

Thanks for pointing that out @dallinjdahl. I agree -- any change to how a flow is running a tool will be manifest by changes to the build recipe, or the underlying dependencies (such as synth.json).

@KeenanRileyFaulkner Given this, is there a reason to put the flow_args/flow_arguments into the build.jinja? It sounded like the motivation for that was to catch changes from one flow to the next, but based on the discussion above, it seems that 1) the approach is not sufficient, and 2) it's not needed anyway, since changes will be manifest in other parts of the recipes/dependences. Thoughts?

KeenanRileyFaulkner commented 7 months ago

The only place the flow arguments are templated into the build.ninja file is in the configure rule. That rule calls the flow manager class to reconfigure the build file when any relevant file in the project changes. There are cases where the build file would reconfigure but all the outputs remain valid and don't run again. However, if the flow arguments were not templated in, the synth build would be invalidated and everything would build again without them, which may be incorrect.

Edit: Sorry I didn't make this part of my response yesterday. I think it answers your original question (see below), but I had to relook through the code to remind myself of what was going on.

However, this causes a new issue. Anytime a flow wants to configure a tool a certain way, it won't be captured by the ninja rebuilder when moving from running one flow to the next. Is that correct?

jgoeders commented 7 months ago

Okay, based on the discussion, it sounds like the problem is more localized that I originally thought. In this case I think we just need to update the VivadoStructuralErrorInjection to only look at files it created in the current invocation. Based on the num_runs argument, we should be able to determine the exact set of files to iterate on by name, rather than doing a iterdir() as is currently in the code.

byuccl / bfasst

Update vivado_structural_error_injection to only look for files it created this invocation #361