Xilinx / pyxir

Apache License 2.0
37 stars 14 forks source link

Relay_Op error when compiling a model #63

Closed jlamperez closed 3 years ago

jlamperez commented 3 years ago

Hi,

I started an issue in tvm repository, https://github.com/apache/tvm/issues/8683 but I think that the correct place for fixing it is here.

I am only pointing out which is the error, I am having a

AttributeError: <class 'tvm.tir.expr.Any'> has no attribute value

And after adding debug traces I have figured out that the problem comes from this RelayOp expression

DEBUG:pyxir:RelayOp:
DEBUG:pyxir:-- op_name: RelayOp-94191981978496
DEBUG:pyxir:-- expr: dyn.strided_slice

I am looking at the code and I am seeing that there are two #TODO comments there https://github.com/Xilinx/pyxir/blob/master/python/pyxir/frontend/tvm/relay_tools/relay_l0_expr_and_others.py#L429-L446.

And the problem comes with this:

ty = expr.checked_type
.
.
.
relay_shape = TensorShape([int(s.value) for s in list(ty.shape)])
DEBUG:pyxir:-- expr: free_var %inputs: Tensor[(1, 3, 416, 416), float32];
dyn.strided_slice(%inputs, meta[relay.Constant][0] /* ty=Tensor[(4), int64] */, meta[relay.Constant][1] /* ty=Tensor[(4), int64] */, meta[relay.Constant][2] /* ty=Tensor[(4), int64] */, begin=None, end=None, strides=None, axes=None) /* ty=Tensor[(?, ?, ?, ?), float32] */
DEBUG:pyxir:-- expr.type_args: [TensorType([1, 3, 416, 416], float32), TensorType([4], int64), TensorType([4], int64), TensorType([4], int64)]

It seems that doesn't like the [?, ?, ?, ?] list..

How do you fix this?

Best regards,

Jorge

jtuyls commented 3 years ago

@jlamperez Yes, the dynamic shapes are causing the issue. I have been experimenting with this and we could represent those dynamic shapes with [-1, -1, -1, -1] internally with a small fix. However, for your specific model, the dynamic shapes of the conv2d layers are going to become an issue afterwards. We can't handle those dynamic shapes with the Vitis AI DPUs and have to know the dimensions upfront. However, it looks to me like it might be possible to make the Relay expression more static with the DynamicToStatic transformation: mod = relay.transform.DynamicToStatic()(mod). Could you check whether this works for you?

jlamperez commented 3 years ago

Hi @jtuyls,

I have added your suggestion before the partition_for_vitis_ai

mod = relay.transform.DynamicToStatic()(mod)
mod = partition_for_vitis_ai(mod, params, dpu=vitis_target)

And the error from above has gone, but I am having another error now related to the shapes of the input layers (multiply-94401731564512','nn_max_pool2d_NCHW-NHWC-94401731356464','nn_max_pool2d_NCHW-NHWC-94401731361136', 'nn_max_pool2d_NCHW-NHWC-94401732311728') to the concat function in l1_basic_nn.py

....
DEBUG:pyxir:-- rhs: Constant, StrVector[Constant], constant-94711480511424, IntVector[1, 1, 1, 128]
DEBUG:pyxir:-- Add = BiasAdd
DEBUG:pyxir:--bias_add shape: [1, 13, 288230376151711744, 128]
DEBUG:pyxir:sigmoid:
DEBUG:pyxir:multiply: 
DEBUG:pyxir:-- lhs: Call
DEBUG:pyxir:-- rhs: Call
DEBUG:pyxir:-- Call
DEBUG:pyxir:Call: nn.max_pool2d
DEBUG:pyxir:Call: multiply
DEBUG:pyxir:MEMORY: MULTIPLY
DEBUG:pyxir:nn_max_pool2d
DEBUG:pyxir:-- name: nn_max_pool2d-94711405311712
DEBUG:pyxir:-- outshape: [1, 128, 13, 288230376151711745]
DEBUG:pyxir:-- Call
DEBUG:pyxir:Call: nn.max_pool2d
DEBUG:pyxir:Call: multiply
DEBUG:pyxir:MEMORY: MULTIPLY
DEBUG:pyxir:nn_max_pool2d
DEBUG:pyxir:-- name: nn_max_pool2d-94711479507712
DEBUG:pyxir:-- outshape: [1, 128, 13, 288230376151711745]
DEBUG:pyxir:-- Call
DEBUG:pyxir:Call: nn.max_pool2d
DEBUG:pyxir:Call: multiply
DEBUG:pyxir:MEMORY: MULTIPLY
DEBUG:pyxir:nn_max_pool2d
DEBUG:pyxir:-- name: nn_max_pool2d-94711405338464
DEBUG:pyxir:-- outshape: [1, 128, 13, 288230376151711745]
OP_NAME concat-94401731273440
BOTTOMS ['multiply-94401731564512', 'nn_max_pool2d_NCHW-NHWC-94401731356464', 'nn_max_pool2d_NCHW-NHWC-94401731361136', 'nn_max_pool2d_NCHW-NHWC-94401732311728']
INPUT_LAYERS [{
  "name": "multiply-94789160568400",
  "type": [
    "Multiply"
  ],
  "shapes": [
    1,
    13,
    288230376151711744,
    128
  ],
  "sizes": [
    0
  ],
  "tops": [
    "nn_max_pool2d_NHWC>NCHW-94789164952624",
    "nn_max_pool2d_NHWC>NCHW-94789165221440",
    "nn_max_pool2d_NHWC>NCHW-94789164125312"
  ],
  "bottoms": [
    "nn_bias_add-94789160696944",
    "sigmoid-94789161235664"
  ],
  "layer": [
    "multiply-94789160568400"
  ],
  "data": [],
  "targets": [],
  "target": "cpu",
  "subgraph": null,
  "subgraph_data": [],
  "internal": false,
  "attrs": {
    "relay_id": [
      94789160568400
    ]
  }
}, {
  "name": "nn_max_pool2d_NCHW-NHWC-94789164952624",
  "type": [
    "Transpose"
  ],
  "shapes": [
    1,
    13,
    288230376151711745,
    128
  ],
  "sizes": [
    1664
  ],
  "tops": [],
  "bottoms": [
    "nn_max_pool2d-94789164952624"
  ],
  "layer": [
    "nn_max_pool2d_NCHW>NHWC-94789164952624"
  ],
  "data": [],
  "targets": [],
  "target": "cpu",
  "subgraph": null,
  "subgraph_data": [],
  "internal": true,
  "attrs": {
    "axes": [
      0,
      2,
      3,
      1
    ]
  }
}, {
  "name": "nn_max_pool2d_NCHW-NHWC-94789165221440",
  "type": [
    "Transpose"
  ],
  "shapes": [
    1,
    13,
    288230376151711745,
    128
  ],
  "sizes": [
    1664
  ],
  "tops": [],
  "bottoms": [
    "nn_max_pool2d-94789165221440"
  ],
  "layer": [
    "nn_max_pool2d_NCHW>NHWC-94789165221440"
  ],
  "data": [],
  "targets": [],
  "target": "cpu",
  "subgraph": null,
  "subgraph_data": [],
  "internal": true,
  "attrs": {
    "axes": [
      0,
      2,
      3,
      1
    ]
  }
}, {
  "name": "nn_max_pool2d_NCHW-NHWC-94789164125312",
  "type": [
    "Transpose"
  ],
  "shapes": [
    1,
    13,
    288230376151711745,
    128
  ],
  "sizes": [
    1664
  ],
  "tops": [],
  "bottoms": [
    "nn_max_pool2d-94789164125312"
  ],
  "layer": [
    "nn_max_pool2d_NCHW>NHWC-94789164125312"
  ],
  "data": [],
  "targets": [],
  "target": "cpu",
  "subgraph": null,
  "subgraph_data": [],
  "internal": true,
  "attrs": {
    "axes": [
      0,
      2,
      3,
      1
    ]
  }
}]
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/pyxir-0.3.0-py3.6-linux-x86_64.egg/pyxir/frontend/tvm/relay_tools/relay_l0_expr_and_others.py", line 117, in call
    RELAY_2_XLAYER, **kwargs)
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/pyxir-0.3.0-py3.6-linux-x86_64.egg/pyxir/frontend/tvm/relay_tools/relay_l1_basic.py", line 414, in concatenate
    X = px.ops.concat(op_name, data_layers, axis, relay_id=relay_idx)
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/pyxir-0.3.0-py3.6-linux-x86_64.egg/pyxir/graph/ops/l1_basic_nn.py", line 167, in concat
    assert i == axis or len(check) == 1, "i: {0}, axis: {1}, check: {2}".format(i, axis, check)
i: 2, axis: 3, check: {288230376151711744, 288230376151711745}

Thanks for your help.

jornt-xilinx commented 3 years ago

Hi @jlamperez I fixed the issues in the Relay conversion and the fixes are in the dev branch now. However, I had a look at the model and the I noticed that the nonlinear ops are sigmoids which we can't handle on the DPUs, so this model can really be accelerated.

jlamperez commented 3 years ago

Okay @jornt-xilinx , thank you for working on this, you can close the issue.

Regards!