Rikorose / DeepFilterNet

Noise supression using deep filtering
https://huggingface.co/spaces/hshr/DeepFilterNet2
Other
2.37k stars 219 forks source link

Converter to .onnx model #174

Closed JBloodless closed 1 year ago

JBloodless commented 1 year ago

Hi, I've been trying to use export.py function to convert retrained model to single onnx file, but it seems that there are version mismatch of torch and onnx operation. After some modifications I've almost made it work, but now I'm stuck with

Traceback (most recent call last):
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 327, in <module>
    main(args)
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 292, in main
    export(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context                                                                                                                        
    return func(*args, **kwargs)
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 160, in export
    export_impl(
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 100, in export_impl
    torch.onnx.export(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/onnx/__init__.py", line 350, in export
    return utils.export(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py", line 163, in export
    _export(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py", line 1074, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py", line 731, in _model_to_graph
    graph = _optimize_graph(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py", line 308, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/onnx/__init__.py", line 416, in _run_symbolic_function                                                                                                                      
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py", line 1421, in _run_symbolic_function                                                                                                                        
    raise symbolic_registry.UnsupportedOperatorError(
torch.onnx.symbolic_registry.UnsupportedOperatorError: Exporting the operator ::view_as_complex to ONNX opset version 14 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

Am I doing something wrong or the latest Deepfilternet2 modifications was not tested with this converter? Or maybe I should use specific versions of onnxruntime/torch?

Rikorose commented 1 year ago

Hi, currently it is not supported to export the full model directly to onnx. You would need to rewrite it so that the model is only doing fully real valued processing. Or use the three sub-models that can be exported via onnx.

JBloodless commented 1 year ago

Ok, got it, thanks

JBloodless commented 1 year ago

With --export_full=False I'm getting

Traceback (most recent call last):
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 331, in <module>
    main(args)
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 296, in main
    export(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 193, in export
    e0, e1, e2, e3, emb, c0, lsnr = export_impl(
  File "/data/code_jb/deepfilter2_git/DeepFilterNet/df/export.py", line 100, in export_impl
    model = torch.jit.script(model, example_inputs=[tuple(a for a in inputs)])
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/jit/_script.py", line 1286, in script
    return torch.jit._recursive.create_script_module(
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/jit/_recursive.py", line 458, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/jit/_recursive.py", line 524, in create_script_module_impl
    create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
  File "/home/i.beskrovnyy/miniconda3/envs/df/lib/python3.10/site-packages/torch/jit/_recursive.py", line 375, in create_methods_and_properties_from_stubs
    concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
RuntimeError: Unsupported value kind: Tensor

even with default model.

Rikorose commented 1 year ago

How exactly do you call export.py? What args do you use?

JBloodless commented 1 year ago

python export.py -m DeepFilterNet2 /data/checkpoints_ivan/df2_ll_onnx --export_full False If I change the model to path to retrained, the error will be the same

JBloodless commented 1 year ago

If I change jit to all False in export_impl, then I'm getting all mismatched elements in encoder:

2022-11-10 14:47:12 | WARNING  | DF |   Elements not close for e0:                                                                                                                                                                                            
Not equal to tolerance rtol=1e-06, atol=1e-05                                                                                                                                                                                                                 

Mismatched elements: 136057 / 204800 (66.4%)    
Max absolute difference: 1.6712724                                                                                                                                                                                                                   [28/1458]
Max relative difference: 31671.553                                                                                                                                                                                                                            
 x: array([[[0.      , 0.      , 0.      , ..., 0.      , 0.      ,                                                                                                                                                                                           
         0.      ],                                                                                                                                                                                                                                           
        [0.      , 0.      , 0.      , ..., 0.01213 , 0.      ,...                                                                                                                                                                                            
 y: array([[[2.185955e-02, 0.000000e+00, 0.000000e+00, ..., 0.000000e+00,                                                                                                                                                                                     
         0.000000e+00, 0.000000e+00],                                                                                                                                                                                                                         
        [0.000000e+00, 0.000000e+00, 0.000000e+00, ..., 0.000000e+00,...                                                                                                                                                                                      
2022-11-10 14:47:12 | WARNING  | DF |   Elements not close for e1:                                                                                                                                                                                            
Not equal to tolerance rtol=1e-06, atol=1e-05                                                                                                                                                                                                                 

Mismatched elements: 73908 / 102400 (72.2%)                                                                                                                                                                                                                   
Max absolute difference: 1.3576229                                                                                                                                                                                                                            
Max relative difference: 38430.625                                                                                                                                                                                                                            
 x: array([[[0.056033, 0.017943, 0.017943, ..., 0.036259, 0.      ,                                                                                                                                                                                           
         0.      ],                                                                                                                                                                                                                                           
        [0.      , 0.735084, 0.      , ..., 0.11884 , 0.      ,...                                                                                                                                                                                            
 y: array([[[0.      , 0.      , 0.      , ..., 0.      , 0.      ,                                                                                                                                                                                           
         0.269067],                                                                                                                                                                                                                                           
        [0.      , 0.      , 0.00842 , ..., 0.      , 0.      ,...                                                                                                                                                                                            
2022-11-10 14:47:12 | WARNING  | DF |   Elements not close for e2:                                                                                                                                                                                            
Not equal to tolerance rtol=1e-06, atol=1e-05                                                                                                                                                                                                                 

Mismatched elements: 38405 / 51200 (75%)                                                                                                                                                                                                                      
Max absolute difference: 0.8311711                                                                                                                                                                                                                            
Max relative difference: 22806.361                                                                                                                                                                                                                            
 x: array([[[0.      , 0.071877, 0.014084, ..., 0.151685, 0.097902,                                                                                                                                                                                           
         0.085669],                                                                                                                                                                                                                                           
        [0.      , 0.      , 0.      , ..., 0.201829, 0.064229,...                                                                                                                                                                                            
 y: array([[[0.      , 0.      , 0.14058 , ..., 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.125281, ..., 0.      , 0.030466,...
2022-11-10 14:47:12 | WARNING  | DF |   Elements not close for e3: 
Not equal to tolerance rtol=1e-06, atol=1e-05

Mismatched elements: 38806 / 51200 (75.8%)
Max absolute difference: 4.9555507
Max relative difference: 7891.4126
 x: array([[[0.      , 0.274603, 0.610629, ..., 0.      , 0.735358,
         0.      ],
        [0.      , 0.      , 0.508879, ..., 0.      , 0.493207,...
 y: array([[[0.      , 0.      , 0.      , ..., 0.      , 0.      ,
         0.      ],
        [1.235723, 1.584068, 1.066446, ..., 2.749812, 2.478191,...
2022-11-10 14:47:12 | WARNING  | DF |   Elements not close for emb: 
Not equal to tolerance rtol=1e-06, atol=1e-05

Mismatched elements: 25501 / 25600 (99.6%)
Max absolute difference: 1.7051635
Max relative difference: 18425.525
 x: array([[-0.530415,  0.087793, -0.143854, ...,  0.054233,  0.745872,
         0.538596],
       [ 0.158454,  0.067803,  0.154594, ...,  0.080428,  0.929604,...
 y: array([[ 0.207319,  0.010258, -0.216485, ...,  0.028774,  0.081924,
        -0.667745],
       [-0.235693,  0.010745,  0.713871, ...,  0.066224,  0.5237  ,...
2022-11-10 14:47:12 | WARNING  | DF |   Elements not close for lsnr: 
Not equal to tolerance rtol=1e-06, atol=1e-05

Mismatched elements: 100 / 100 (100%)
Max absolute difference: 9.594181
Max relative difference: 0.7053432
 x: array([ -4.007965,  -9.974195, -13.913005, -13.674997, -13.577035,
       -14.126597, -14.221871, -12.456777, -13.813079, -14.568176,
       -13.951439, -13.464954, -12.111427, -13.188027, -13.694571,...
 y: array([-13.602146, -14.349119, -14.787824, -14.694757, -14.748948,
       -14.843171, -14.871448, -14.788654, -14.894756, -14.918068,
       -14.885553, -14.82298 , -14.579944, -14.797973, -14.873879,...

but strangely not in the erb_decoder, which also had jit=True by default

Rikorose commented 1 year ago

I cannot reproduce your issues. For me it runs fine:

$ python DeepFilterNet/df/scripts/export.py -m DeepFilterNet2 /tmp/export
Namespace(model_base_dir='DeepFilterNet2', pf=False, output_dir=None, log_level='INFO', epoch='best', version=False, export_dir='/tmp/export', check=True, simplify=False, opset=12)
2022-11-17 09:32:28 | INFO     | DF | Running on torch 1.14.0.dev20221026
2022-11-17 09:32:28 | INFO     | DF | Running on host T480s
2022-11-17 09:32:28 | INFO     | DF | Git commit: 2ae7883, branch: main
2022-11-17 09:32:28 | INFO     | DF | Loading model settings of DeepFilterNet2
2022-11-17 09:32:28 | INFO     | DF | Using DeepFilterNet2 model at /home/hendrik/.cache/DeepFilterNet/DeepFilterNet2
2022-11-17 09:32:28 | INFO     | DF | Initializing model `deepfilternet2`
2022-11-17 09:32:28 | INFO     | DF | Found checkpoint /home/hendrik/.cache/DeepFilterNet/DeepFilterNet2/checkpoints/model_96.ckpt.best with epoch 96
2022-11-17 09:32:28 | INFO     | DF | Running on device cpu
2022-11-17 09:32:28 | INFO     | DF | Model loaded
2022-11-17 09:32:29 | INFO     | DF | Exporting model 'enc' to /tmp/export
2022-11-17 09:32:29 | INFO     | DF |   Input shapes: {'feat_erb': torch.Size([1, 1, 100, 32]), 'feat_spec': torch.Size([1, 2, 100, 96])}
2022-11-17 09:32:29 | INFO     | DF |   Output shapes: {'e0': torch.Size([1, 64, 100, 32]), 'e1': torch.Size([1, 64, 100, 16]), 'e2': torch.Size([1, 64, 100, 8]), 'e3': torch.Size([1, 64, 100, 8]), 'emb': torch.Size([1, 100, 256]), 'c0': torch.Size([1, 64, 100, 96]), 'lsnr': torch.Size([1, 100, 1])}
2022-11-17 09:32:29 | INFO     | DF |   Dynamic axis: {'feat_erb': {2: 'S'}, 'feat_spec': {2: 'S'}, 'e0': {2: 'S'}, 'e1': {2: 'S'}, 'e2': {2: 'S'}, 'e3': {2: 'S'}, 'emb': {1: 'S'}, 'c0': {2: 'S'}, 'lsnr': {1: 'S'}}
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py:823: UserWarning: no signature found for <torch.ScriptMethod object at 0x7f91f96d8450>, skipping _decide_input_format
  warnings.warn(f"{e}, skipping _decide_input_format")
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/conda/conda-bld/pytorch_1666768144987/work/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py:4377: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with GRU can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
  warnings.warn(
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/conda/conda-bld/pytorch_1666768144987/work/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1891.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py:687: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/conda/conda-bld/pytorch_1666768144987/work/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py:687: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/conda/conda-bld/pytorch_1666768144987/work/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1891.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py:1178: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/conda/conda-bld/pytorch_1666768144987/work/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py:1178: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/conda/conda-bld/pytorch_1666768144987/work/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1891.)
  _C._jit_pass_onnx_graph_shape_type_inference(
2022-11-17 09:32:29 | INFO     | DF | Exporting model 'erb_dec' to /tmp/export
2022-11-17 09:32:29 | INFO     | DF |   Input shapes: {'emb': torch.Size([1, 100, 256]), 'e3': torch.Size([1, 64, 100, 8]), 'e2': torch.Size([1, 64, 100, 8]), 'e1': torch.Size([1, 64, 100, 16]), 'e0': torch.Size([1, 64, 100, 32])}
2022-11-17 09:32:29 | INFO     | DF |   Output shapes: {'m': torch.Size([1, 100, 32])}
2022-11-17 09:32:30 | INFO     | DF |   Dynamic axis: {'emb': {1: 'S'}, 'e3': {2: 'S'}, 'e2': {2: 'S'}, 'e1': {2: 'S'}, 'e0': {2: 'S'}, 'm': {2: 'S'}}
/home/hendrik/mambaforge/envs/df/lib/python3.10/site-packages/torch/onnx/utils.py:823: UserWarning: no signature found for <torch.ScriptMethod object at 0x7f91fa1920c0>, skipping _decide_input_format
  warnings.warn(f"{e}, skipping _decide_input_format")
2022-11-17 09:32:30 | INFO     | DF | Exporting model 'df_dec' to /tmp/export
2022-11-17 09:32:30 | INFO     | DF |   Input shapes: {'emb': torch.Size([1, 100, 256]), 'c0': torch.Size([1, 64, 100, 96])}
2022-11-17 09:32:30 | WARNING  | DF |   Number of tensors (2) does not match provided names: ['coefs']
2022-11-17 09:32:30 | INFO     | DF |   Output shapes: {'coefs': torch.Size([1, 100, 96, 10])}
2022-11-17 09:32:30 | INFO     | DF |   Dynamic axis: {'emb': {1: 'S'}, 'c0': {2: 'S'}, 'coefs': {1: 'S'}}
JBloodless commented 1 year ago

For some reason I'm getting the same error even on clean project without any of my modifications. I'll try to isolate this error, because for now error point model = torch.jit.script(model, example_inputs=[tuple(a for a in inputs)]) is too ambiguous.

JBloodless commented 1 year ago

For now I clone the latest version of Deepfilternet, made fresh conda environment only with install from poetrylock, and I'm still getting this error

(/data/conda/df_exp) i.beskrovnyy@vmsdn1-hosting117:/data/code_jb/backup/DeepFilterNet/DeepFilterNet$ python df/scripts/export.py -m DeepFilterNet2 /tmp/export
Namespace(model_base_dir='DeepFilterNet2', pf=False, output_dir=None, log_level='INFO', epoch='best', version=False, export_dir='/tmp/export', check=True, simplify=False, opset=12)
2022-11-17 16:26:14 | INFO     | DF | Running on torch 1.13.0
2022-11-17 16:26:14 | INFO     | DF | Running on host vmsdn1-hosting117
2022-11-17 16:26:14 | INFO     | DF | Git commit: 2ae7883, branch: main
2022-11-17 16:26:14 | INFO     | DF | Loading model settings of DeepFilterNet2
2022-11-17 16:26:14 | INFO     | DF | Using DeepFilterNet2 model at /home/i.beskrovnyy/.cache/DeepFilterNet/DeepFilterNet2
2022-11-17 16:26:14 | INFO     | DF | Initializing model `deepfilternet2`
2022-11-17 16:26:16 | INFO     | DF | Found checkpoint /home/i.beskrovnyy/.cache/DeepFilterNet/DeepFilterNet2/checkpoints/model_96.ckpt.best with epoch 96
2022-11-17 16:26:16 | INFO     | DF | Running on device cuda:0
2022-11-17 16:26:16 | INFO     | DF | Model loaded
2022-11-17 16:26:17 | INFO     | DF | Exporting model 'enc' to /tmp/export
2022-11-17 16:26:17 | INFO     | DF |   Input shapes: {'feat_erb': torch.Size([1, 1, 100, 32]), 'feat_spec': torch.Size([1, 2, 100, 96])}
2022-11-17 16:26:17 | INFO     | DF |   Output shapes: {'e0': torch.Size([1, 64, 100, 32]), 'e1': torch.Size([1, 64, 100, 16]), 'e2': torch.Size([1, 64, 100, 8]), 'e3': torch.Size([1, 64, 100, 8]), 'emb': torch.Size([1, 100, 256]), 'c0': torch.Size([1, 64, 100, 96]), 'lsnr': torch.Size([1, 100, 1])}
/data/conda/df_exp/lib/python3.10/site-packages/torch/jit/_script.py:1280: UserWarning: Warning: monkeytype is not installed. Please install https://github.com/Instagram/MonkeyType to enable Profile-Directed Typing in TorchScript. Refer to https://github.com/Instagram/MonkeyType/blob/master/README.rst to install MonkeyType. 
  warnings.warn("Warning: monkeytype is not installed. Please install https://github.com/Instagram/MonkeyType "
Traceback (most recent call last):
  File "/data/code_jb/backup/DeepFilterNet/DeepFilterNet/df/scripts/export.py", line 336, in <module>
    main(args)
  File "/data/code_jb/backup/DeepFilterNet/DeepFilterNet/df/scripts/export.py", line 302, in main
    export(
  File "/data/conda/df_exp/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/data/code_jb/backup/DeepFilterNet/DeepFilterNet/df/scripts/export.py", line 193, in export
    e0, e1, e2, e3, emb, c0, lsnr = export_impl(
  File "/data/code_jb/backup/DeepFilterNet/DeepFilterNet/df/scripts/export.py", line 98, in export_impl
    model = torch.jit.script(model, example_inputs=[tuple(a for a in inputs)])
  File "/data/conda/df_exp/lib/python3.10/site-packages/torch/jit/_script.py", line 1286, in script
    return torch.jit._recursive.create_script_module(
  File "/data/conda/df_exp/lib/python3.10/site-packages/torch/jit/_recursive.py", line 476, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/data/conda/df_exp/lib/python3.10/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
    create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
  File "/data/conda/df_exp/lib/python3.10/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
    concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
RuntimeError: Unsupported value kind: Tensor
Rikorose commented 1 year ago

Now I know the issue:

UserWarning: Warning: monkeytype is not installed. Please install https://github.com/Instagram/MonkeyType to enable Profile-Directed Typing in TorchScript. Refer to https://github.com/Instagram/MonkeyType/blob/master/README.rst to install MonkeyType.

JBloodless commented 1 year ago

Oh wow, I didn't think that it is a big deal. Now at least clean install works.

JBloodless commented 1 year ago

Yeah, and now I'm able to export my own custom model. Sorry for my inattentiveness and thanks for your time.

wangmou21 commented 1 year ago

Hi, I have installed monkeytype. However, I still have the same problem.