ENOT-AutoDL / onnx2torch

Convert ONNX models to PyTorch.
Apache License 2.0
587 stars 69 forks source link

Clip convert error: Dynamic value of min/max is not implemented #191

Open ongiaf opened 9 months ago

ongiaf commented 9 months ago

Hi, I have an onnx model. Here is one of the nodes in onnxgraph

node {
  input: "/decoder.0/layers.0/blocks.0/attn/Pow_3_output_0"
  input: "/decoder.0/layers.0/blocks.0/attn/Constant_13_output_0"
  input: ""
  output: "/decoder.0/layers.0/blocks.0/attn/Clip_1_output_0"
  name: "/decoder.0/layers.0/blocks.0/attn/Clip_1"
  op_type: "Clip"
  doc_string: "...."
}

When I tried to convert it to the torch model, it cased a KeyError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/micromamba/envs/ai-models/lib/python3.10/site-packages/onnx2torch/converter.py", line 110, in convert
    torch_module, onnx_mapping = converter(onnx_node, onnx_graph)
  File "/root/micromamba/envs/ai-models/lib/python3.10/site-packages/onnx2torch/node_converters/clip.py", line 60, in _
    raise NotImplementedError('Dynamic value of min/max is not implemented') from exc
NotImplementedError: Dynamic value of min/max is not implemented

it may caused by https://github.com/ENOT-AutoDL/onnx2torch/blob/a8b060336c8c95c51a6257a8d99171f0b86b8eab/onnx2torch/node_converters/clip.py#L60

After adding conditions

min_val = float(get_const_value(min_name, graph)) if (min_name is not None and min_name != '') else None
max_val = float(get_const_value(max_name, graph)) if (max_name is not None and max_name != '') else None

The convert can work.

ongiaf commented 9 months ago

The full onnx model can be download from here:

  1. https://get.ecmwf.int/repository/test-data/ai-models/fuxi/short.onnx
  2. ONNX External Data: https://get.ecmwf.int/repository/test-data/ai-models/fuxi/short
dsuhoi commented 7 months ago

@ongiaf Do you have any success decoding FuXi (I've been fine-tuning this model for a long time)? I recommend paying attention to this solution

ongiaf commented 7 months ago

Thanks, it's excellent work. And with some dirty work, Fuxi can successfully run on PyTorch with Onnx2Torch. In Onnx2Torch, problems are mainly about LayerNormalization and Clip.

dsuhoi commented 7 months ago

@ongiaf Did you manage to run FuXi with the current weights for the fine-tuning process (I am currently thinking about how to complete the work on the model on a 1-hour grid and thought about freezing some layers except the U-transformer.) ?

juanqiu1 commented 7 months ago

Thanks, it's excellent work. And with some dirty work, Fuxi can successfully run on PyTorch with Onnx2Torch. In Onnx2Torch, problems are mainly about LayerNormalization and Clip.

Thank you for posting your changes about Clip. Could you also suggest how to fix LayerNormalization? Looks like the converted model has issue with torch.layer_norm call.

dsuhoi commented 7 months ago

@juanqiu1 In order for this to work with FuXi, you will need to change the onnx2torch/node_converters/layer_norm.py parameter to [1536]:

@add_converter(operation_type='LayerNormalization', version=17)
def _(node: OnnxNode, graph: OnnxGraph) -> OperationConverterResult:
    node_attributes = node.attributes

    axis = node_attributes.get('axis', AXIS_DEFAULT_VALUE)
    epsilon = node_attributes.get('epsilon', EPSILON_DEFAULT_VALUE)

    if all(value_name in graph.initializers for value_name in node.input_values[1:]):
        input_value_info = graph.value_info[node.input_values[0]]
        input_shape = get_shape_from_value_info(input_value_info)
        torch_module = nn.LayerNorm(
            normalized_shape=(1536), # input_shape[axis:], (this block!)
            eps=epsilon,
            elementwise_affine=True,
        )
juanqiu1 commented 6 months ago

@dsuhoi Thank you for hinting, there are a couple of other easy fixes (typing, etc). Did you manage to run FuXi with the current weights for the fine-tuning process Do you have any progress on that? After conversion, I loaded model into pytorch but even on A100 with FSDP enabled via accelerate. I still get CUDA out-of-memory error.

dsuhoi commented 6 months ago

@juanqiu1 Yes, I managed to start the learning process by highlighting the named_parameters() part within the last dozen UTransformer (this was enough for fine-tunning).

I used Nvidia A100 (40GB).