Quick summary

When applying the MoveLinearPastEltWiseAdd() transformation on a graph where there are two identical linear producers of an element-wise add, if one of the linear nodes is a fork, the graph gets mangled.

Details

Steps to Reproduce

Clone the FINN repository
Checkout the dev branch, with commit hash: fd72d48410c040ae8f715919e65ec731580235ce
Download 'resnet18-4w4a_tidied.onnx'
Create a 'build.py' script in the same directory as 'resnet18-4w4a_tidied.onnx' which applies the MoveLinearPastEltWiseAdd() transformation on 'resnet18-4w4a.onnx', then saves the modified ONNX model. Here's an example 'build.py':
```
from sys import argv
from qonnx.core.modelwrapper import ModelWrapper
from finn.transformation.streamline.reorder import MoveLinearPastEltwiseAdd
```

Set the filename that we look for our ONNX model under.

onnx_model_filename = "resnet18-4w4a_tidied"

Wrap our onnx model in the ModelWrapper class

onnx_model = ModelWrapper(f"{onnx_model_filename}.onnx")

Use the ModelWrapper class' transform function to

apply the MoveLinearPastEltWiseAdd() transformation.

onnx_model = onnx_model.transform(MoveLinearPastEltwiseAdd())

Save the modified ONNX model.

onnx_model.save(f"{onnx_model_filename}_MODIFIED.onnx")


5. Start the docker container and have it run '`build.py`', with the command: `./run-docker.sh build_custom <Path_To_Build.py>`. Make sure to replace <Path_To_Build.py> with the directory containing '`build.py`' and '`resnet18-4w4a_tidied.onnx`'.
6. Wait for the docker container to finish running '`build.py`'. It will print `The program finished and will be restarted` once it has completed.
7. Open the modified ONNX model in [Netron.app](Netron.app)

This process produces the following ONNX file: '[resnet18-4w4a_tidied_MODIFIED.onnx](https://drive.google.com/file/d/1or-vS0evBq0ze7S_qB1eYnW1vh7du__W/view?usp=sharing)'

### Expected behavior

Either:

- Before moving linear nodes past element-wise addition nodes, The transformation moves all forked linear operations past their forks, with their producers becoming forks instead (this is already implemented in [MoveLinearPastFork()](https://github.com/Xilinx/finn/blob/b3bdff118ae076cb776af6e51ddc28eeaa0d6390/src/finn/transformation/streamline/reorder.py#L888)). This would avoid the transformation moving fork nodes.
___or...___
- The transformation warns the user if they try to apply [MoveLinearPastEltWiseAdd()](https://github.com/Xilinx/finn/blob/b3bdff118ae076cb776af6e51ddc28eeaa0d6390/src/finn/transformation/streamline/reorder.py#L502) in a way that would result in fork nodes being moved past join nodes.
___or...___
- The transformation ignores linear fork nodes.

### Actual behavior

The transformation moves a fork node past its relevant joint node. This results in a huge mess of a graph (see images below). 

- Before applying [MoveLinearPastEltWiseAdd()](https://github.com/Xilinx/finn/blob/b3bdff118ae076cb776af6e51ddc28eeaa0d6390/src/finn/transformation/streamline/reorder.py#L502):
![before](https://github.com/Xilinx/finn/assets/24257802/5fd53ad8-fa08-467f-9e1d-a47131960b27)
- After applying [MoveLinearPastEltWiseAdd()](https://github.com/Xilinx/finn/blob/b3bdff118ae076cb776af6e51ddc28eeaa0d6390/src/finn/transformation/streamline/reorder.py#L502):
![after](https://github.com/Xilinx/finn/assets/24257802/6d902371-b090-4923-873c-876867c1384b)

### Possible fixes

I'm unsure which of these three fixes would suit the project best:

- [MoveLinearPastFork()](https://github.com/Xilinx/finn/blob/b3bdff118ae076cb776af6e51ddc28eeaa0d6390/src/finn/transformation/streamline/reorder.py#L888) could be applied to the model at the start of the transformation. This moves all forked linear operations past their forks, with their producers becoming forks instead.
- A warning could be raised whenever a fork node is moved by the transformation.
- The transformation could ignore fork nodes.

### Additional context

- _The above behaviour is technically correct!_
  * In the example there are 2 identical multiply nodes before an element-wise addition. The transformation's correct behaviour is to move both of them past the element-wise add. ___However___, the transformation does not realise one of the multiply nodes is a fork. The graph produced in this instance is complete nonsense.
- _While the behavior is "correct", (and easily circumvented by I don't think it's expected._

### ONNX files

Pre-transformation ONNX file: '[resnet18-4w4a_tidied.onnx](https://drive.google.com/file/d/1rPscSxYJnl43dQzbGhKv8wPYdWz-xpQG/view?usp=sharing)'
Post-transformation ONNX file: '[resnet18-4w4a_tidied_MODIFIED.onnx](https://drive.google.com/file/d/1or-vS0evBq0ze7S_qB1eYnW1vh7du__W/view?usp=sharing)'

Xilinx / finn

MoveLinearPastEltWiseAdd() transformation bug #881