Xilinx / RapidWright

Build Customized FPGA Implementations for Vivado
http://www.rapidwright.io
Other
284 stars 109 forks source link

NullPointerException and/or incorrect Node's returned from Wire.getNode method #105

Closed litghost closed 3 years ago

litghost commented 3 years ago

The Wire.getNode method appears to return the incorrect Node or even cause a NullPointerException. Test case is in #103. Example invocation:

# java com.xilinx.rapidwright.tests.NodeToWiresToNode xc7a35tcpg236-1
==============================================================================
==               Check Node -> Wires -> Node: xc7a35tcpg236-1               ==
==============================================================================
             Load Device:     0.872s     59.640MBs
Exception in thread "main" java.lang.RuntimeException: Got different Node on Node -> Wire -> Node, node 1 = WR1BEG_S0 node 2 = CMT_TOP_WL1END3_12
        at com.xilinx.rapidwright.tests.NodeToWiresToNode.testNode(NodeToWiresToNode.java:41)
        at com.xilinx.rapidwright.tests.NodeToWiresToNode.main(NodeToWiresToNode.java:78)

The current code in #103 uses:

            if(!node.getWireName().equals(nodeFromWire.getWireName())) {

for equality checking, but:

            if(!node.equals(nodeFromWire)) {

fails too.

If the check is completely removed, the following error is produced instead:

# java com.xilinx.rapidwright.tests.NodeToWiresToNode xc7a35tcpg236-1
Exception in thread "main" java.lang.NullPointerException
        at com.xilinx.rapidwright.device.Tile.a(Unknown Source)
        at com.xilinx.rapidwright.device.Node.a(Unknown Source)
        at com.xilinx.rapidwright.device.Node.<init>(Unknown Source)
        at com.xilinx.rapidwright.device.Node.<init>(Unknown Source)
        at com.xilinx.rapidwright.device.Wire.getNode(Unknown Source)
        at com.xilinx.rapidwright.tests.NodeToWiresToNode.testNode(NodeToWiresToNode.java:38)
        at com.xilinx.rapidwright.tests.NodeToWiresToNode.main(NodeToWiresToNode.java:79)
clavin-xlnx commented 3 years ago

This is very similar to Issue #63. I've looked into trying to resolve the missing Node references but getting all of them requires a non-trivial amount of effort on the internal RapidWright side.

Reproducing this issue, is rather straight-forward with a "for all" loop. However, in practical settings, it is unlikely to have much use for a Node reference from arbitrary Wire elements that do not have much significance. However, I am open to be persuaded otherwise.

litghost commented 3 years ago

This is very similar to Issue #63. I've looked into trying to resolve the missing Node references but getting all of them requires a non-trivial amount of effort on the internal RapidWright side.

Reproducing this issue, is rather straight-forward with a "for all" loop. However, in practical settings, it is unlikely to have much use for a Node reference from arbitrary Wire elements that do not have much significance. However, I am open to be persuaded otherwise.

I'd argue you should either make Wire.getNode always work, or remove it. Semi-randomly causing a NPE is not an acceptable implementation.

In terms of fixes, I've prototyped a solution for this (wire -> node) that was intended for the device database node <-> wire relationship. It is general enough to be used in RapidWright to implement this properly: https://github.com/SymbiFlow/prjxray/pull/1477

Data model: https://github.com/SymbiFlow/prjxray/blob/5d77fb36d3773bfddd764a9f9f3623c6b139b5f1/minitests/graph_folding/graph_storage.capnp Example implementation: https://github.com/SymbiFlow/prjxray/blob/5d77fb36d3773bfddd764a9f9f3623c6b139b5f1/minitests/graph_folding/graph_lookup.py

clavin-xlnx commented 3 years ago

I agree, NPEs are not acceptable and need to be fixed. Modifying the backend implementation is expensive and I am continuing to explore to identify a lower cost solution.

clavin-xlnx commented 3 years ago

This should be fixed in 2020.1.6. There is a small runtime hit, especially for larger devices when querying uncommon wires, but a message will print on the first instance and will be cached thereafter.