apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.46k stars 648 forks source link

PyTorch to CoreML model conversion via scripting (torch.jit.script) gives 3 different errors (missing op 'dim', IndexError: out of Range, and INTERNAL ERROR) #765

Open leovinus2001 opened 4 years ago

leovinus2001 commented 4 years ago

Title:

PyTorch to CoreML model conversion via scripting (torch.jit.script) gives error messages.

Relevance:

For certain models, torch scripting is preferred over the JIT trace as explained here https://coremltools.readme.io/docs/model-scripting

When we use the torch.jit.script in this test model, we see that we get error messages in to CoreML conversion which is probably wrong.

Reproducible:

Yes

Testcase:

Attached testScripting.txt

We have 4 modes here. 1) With and without scripting. 2) And two branches in the forward.

The issues are that the scripted_model = torch.jit.script(model, dummy_input) operation gives several error messages including a) RuntimeError: PyTorch convert function for op dim not implemented Which op? b) IndexError: list index out of range (Makes no sense as we should not specify the outputs in ct.convert() via PyTorch if I understand correctly) while the JIT tracing seems fine.

Setup:

Torch version : 1.5.0 CoreML tools version : 4.0b1

Logs (2):

useScriptingFlag = False AND "if 0" in def forward(self, x_): All fine for the JIT tracing, no Log.

useScriptingFlag = False AND "if 1" in def forward(self, x_): All fine for the JIT tracing, no Log.

================================================================== useScriptingFlag = True AND "if 0" in def forward(self, x_):

Torch version : 1.5.0 CoreML tools version : 4.0b1 TestModel( (fc1): Linear(in_features=28, out_features=10, bias=True) ) ~/Library/Python/3.7/lib/python/site-packages/torch/jit/init.py:1256: UserWarning: optimize is deprecated and has no effect. Use with torch.jit.optimized_execution() instead warnings.warn("optimizeis deprecated and has no effect. Usewith torch.jit.optimized_execution() instead") Converting Frontend ==> MIL Ops: 40%|███████████████████████████████████████████████████████████████████████████████████████████████████▏ | 2/5 [00:00<00:00, 4957.81 ops/s] Traceback (most recent call last): File "testScripting.py", line 44, in inputs= [ ct.TensorType(name="input1", shape=dummy_input.shape) ] File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/_converters_entry.py", line 299, in convert kwargs File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/converter.py", line 120, in _convert prog = frontend_converter(model, kwargs) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/converter.py", line 62, in call return load(*args, **kwargs) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 84, in load raise e File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 76, in load prog = converter.convert() File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 302, in convert convert_nodes(self.context, self.graph) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 52, in convert_nodes "PyTorch convert function for op {} not implemented".format(node.kind) RuntimeError: PyTorch convert function for op dim not implemented

================================================================== useScriptingFlag = True AND "if 1" in def forward(self, x_):

Torch version : 1.5.0 CoreML tools version : 4.0b1 TestModel( (fc1): Linear(in_features=28, outfeatures=10, bias=True) ) testScripting.py:16: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! x = x[: mySeqLen, :] ~/Library/Python/3.7/lib/python/site-packages/torch/jit/init.py:1256: UserWarning: optimize is deprecated and has no effect. Use with torch.jit.optimized_execution() instead warnings.warn("optimizeis deprecated and has no effect. Usewith torch.jit.optimized_execution() instead") Traceback (most recent call last): File "testScripting.py", line 44, in inputs= [ ct.TensorType(name="input1", shape=dummy_input.shape) ] File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/_converters_entry.py", line 299, in convert kwargs File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/converter.py", line 120, in _convert prog = frontend_converter(model, kwargs) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/converter.py", line 62, in call return load(*args, **kwargs) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 73, in load converter = TorchConverter(torchscript, inputs, outputs, cut_at_symbols) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 142, in init raw_graph, params_dict, self.inputs, cut_at_symbols File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/internal_graph.py", line 176, in init self.nodes.append(InternalTorchIRNode(raw_node)) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/internal_graph.py", line 97, in init self.name = self.outputs[0] IndexError: list index out of range

leovinus2001 commented 4 years ago

Relevance:

The PyTorch scripting at https://coremltools.readme.io/docs/model-scripting is essential for dynamic aspects of models.

If the conversion to CoreML fails then we might not be able to use models with dynamic stuff in CoreML. This is relevant for some Transformer models, see GitHub question #766

Today, we update the scripting issue and add an extra testcase such that we now can demonstrate three ways the conversion of a scripted PyTorch to CoreML model fails, with two testcases.

Issue:

Here is an additional testcase testScripting3.txt which follows https://coremltools.readme.io/docs/model-scripting where the conversion to CoreML fails with

File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 329, in _expand_and_optimize_ir torchscript.forward.graph, torchscript._c RuntimeError: isTensor() INTERNAL ASSERT FAILED at ../aten/src/ATen/core/ivalue_inl.h:111, please report a bug to PyTorch. Expected Tensor but got Int

Seems like the coremltools v4.0b1 TOT produces IR that cannot be processed?

I see that the message says "please report a bug to PyTorch. " but it seems that the coremltools converter is the cause of that which is why we have a bug report here.

Reproducible?

yes, here is the testcase.

This testcase testScripting3.py has 2 boolean flags and therefore 4 possible control flows. Three of them fail with the error above. Here is the log from useScriptedNarrowClass = True AND useScriptedModelFlag = True

Log -----------

Torch version : 1.5.1 CoreML tools version : 4.0b1

TestModel uses the scripted NarrowModule

TestModel( (fc1): Linear(in_features=28, out_features=10, bias=True) (scripted_narrow_module): RecursiveScriptModule(original_name=NarrowModule) )

Output shape after forward = torch.Size([5, 1, 10])

a1) Graph from JIT traced model

graph(%self.1 : torch.TestModel, %x_ : Float(30, 1, 28)): %37 : torch.torch.nn.modules.linear.Linear = prim::GetAttrname="fc1" %34 : torch.NarrowModule = prim::GetAttrname="scripted_narrow_module" %input : Tensor = prim::CallMethod[name="forward"](%34, %x_) %39 : Tensor = prim::CallMethod[name="forward"](%37, %input) return (%39)



a2) Code from JIT traced model

def forward(self, x_: Tensor) -> Tensor: _0 = self.fc1 input = (self.scripted_narrowmodule).forward(x, ) return (_0).forward(input, )



b1) Graph from scripted model

graph(%self : torch._torch_mangle3.TestModel, %x.1 : Tensor): %11 : str = prim::Constant[value="Exception"]() # testScripting3.py:78:8 %9 : int = prim::Constant[value=3]() # testScripting3.py:78:32 %7 : int[] = prim::shape(%x_.1) %8 : int = aten::len(%7) # testScripting3.py:78:15 %10 : bool = aten::eq(%8, %9) # testScripting3.py:78:15 = prim::If(%10) # testScripting3.py:78:8 block0(): -> () block1(): = prim::RaiseException(%11) # testScripting3.py:78:8 -> () %13 : torch.NarrowModule = prim::GetAttrname="scripted_narrow_module" %y.1 : Tensor = prim::CallMethod[name="forward"](%13, %x_.1) # testScripting3.py:80:12 %16 : torch__.torch.nn.modules.linear.___torch_mangle_2.Linear = prim::GetAttrname="fc1" %y.3 : Tensor = prim::CallMethod[name="forward"](%16, %y.1) # testScripting3.py:81:12 return (%y.3)



b2) Code from scripted model

def forward(self, x_: Tensor) -> Tensor: 0 = torch.eq(torch.len(ops.prim.shape(x)), 3) if _0: pass else: ops.prim.RaiseException("Exception") y = (self.scripted_narrowmodule).forward(x, ) return (self.fc1).forward(y, )


Start conversion to CoreML, useScriptedModelFlag = True

Traceback (most recent call last): File "testScripting3.py", line 123, in inputs= [ ct.TensorType(name="input1", shape=dummy_input.shape) ] File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/_converters_entry.py", line 299, in convert kwargs File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/converter.py", line 120, in _convert prog = frontend_converter(model, kwargs) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/converter.py", line 62, in call return load(*args, **kwargs) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 73, in load converter = TorchConverter(torchscript, inputs, outputs, cut_at_symbols) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 140, in init raw_graph, params_dict = self._expand_and_optimize_ir(self.torchscript) File "~/Library/Python/3.7/lib/python/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 329, in _expand_and_optimize_ir torchscript.forward.graph, torchscript._c RuntimeError: isTensor() INTERNAL ASSERT FAILED at ../aten/src/ATen/core/ivalue_inl.h:111, please report a bug to PyTorch. Expected Tensor but got Int

leovinus2001 commented 4 years ago

Related to #766 and #816, I have used the composite operators and @register_torchop to code a dim() shape() and \_getitem__(). Then, the 4 modes in the test case above run through seemingly ok. No errors, unless when I have prints or assert active. Then, 'IndexError: out of range' shows up again.

While I can post the code here, that would be pointless if the engineers are planning to augment coremltools/converters/mil/frontend/torch/ops.py with these operators dim/shape/getitem anyway. In particular, I think that my solution can be simplified and needs a check for correctness. Suggestions?

HyeonjeongHa commented 3 years ago

I have same error you mentioned. Can you share the code of 'dim/shape/getitem'? Is it right that the error is gone when three operators are implemented?

HashedViking commented 3 years ago

@leovinus2001 Could you, please, share the code for "dim/shape/getitem/inverse". Those are still not implemented in version 4.1.