fastmachinelearning / qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
https://qonnx.readthedocs.io/
Apache License 2.0
124 stars 39 forks source link

Strage Behavior of FoldConstants Transformation #104

Open iksnagreb opened 8 months ago

iksnagreb commented 8 months ago

Occasionally, I had the FoldConstants transformation produce wrong shape constants when applied to Reshape nodes where the shape input is produced by a bunch of Shape-Gather-Unsqueeze-Concat operators (this seems to be common export behavior of PyTorch to produce these, but it should be easily constant-fold-able when all shapes are known at export time). By "wrong" I mean at least one axis is zero and the resulting graph is broken beyond repair from that point on (all following shapes make no sense at all). I have not really an idea what exactly is going on and giving a minimal example is difficult as it seems to occur only for more complex operator patterns deeper inside the model (e.g., the same pattern is folded fine in the first layer but then it breaks in the second), but a fix seems to be trivial: Insert a break here, leaving the loop to remove the node and re-do the shape annotations after each folded constant instead of just once at the end.

Not sure whether this is the proper way to solve this and I will try to follow up on this, hopefully with a reproducible example later, but before I forget I wanted to document this issue and maybe someone else already encountered this or something similar and knows what is going on.

maltanar commented 8 months ago

This may be related to on-the-fly modifications of the model.graph.node container, I've previously observed some weird behavior too when nodes are being added/removed to the container as some transformation iterates over it. This can give rise to some very confusing bugs, and the easiest way out is actually to stop iterating over the node container once a node is is added/removed and let the transformation run once again. Adding break at that location in FoldConstants, as you suggest above, should be taking care of that.