pytorch tensor mul api bug?

I realize a network in java like this:

public Tensor[] forward(Tensor... inputs) {
        Tensor x = inputs[0];
        Tensor hidden_state = inputs[1];
        // [r, u] = sigmoid(A[x, h]W + b)
        // [r, u] (batch_size, num_nodes * (2 * num_gru_units))
        Tensor concatenation = torch.sigmoid(this.graph_conv1.forward(x, hidden_state)[0]);
        // r (batch_size, num_nodes*num_gru_units)
        // u (batch_size, num_nodes* num_gru_units)
        TensorVector tensors = torch.chunk(concatenation, 2, 1);
        Tensor r = tensors.get(0);
        Tensor u = tensors.get(1);
        // c = tanh(A[x, (r * h)W + b])
        // c (batch_size, num_nodes * num_gru_units)
        Tensor c = torch.tanh(this.graph_conv2.forward(x, r.mul(hidden_state))[0]);
        // h := u * h + (1 - u) * c
        // h (batch_size, num_nodes * num_gru_units)
        Tensor tmp = this.constantOne.sub(u);
        Tensor new_hidden_state = u.mul(hidden_state).add(c.mul(tmp));
        return new Tensor[] { new_hidden_state, new_hidden_state };
    }

The problem is that after training thousands of steps the program will crash. The log shows:

Caused by: java.lang.RuntimeException: Could not run 'aten::mul.Tensor' with arguments from the 'FuncTorchGradWrapper' 
backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build 
process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes 
for possible resolutions. 'aten::mul.Tensor' is only available for these backends: [Dense, FPGA, ORT, Vulkan, Metal, Meta, 
Quantized, CustomRNGKeyId, MkldnnCPU, Sparse, SparseCsrCPU, SparseCsrCUDA, NestedTensor, BackendSelect, Python, Fake, 
Named, Conjugate, Negative, ZeroTensor, FuncTorchDynamicLayerBackMode, ADInplaceOrView, AutogradOther, AutogradFunctionality, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, Autocast, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, Functionalize, DeferredInit, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, TESTING_ONLY_GenericWrapper, TESTING_ONLY_GenericMode, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, CPU, CUDA, HIP, XLA, MPS, IPU, XPU, HPU, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

Undefined: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
CPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
CUDA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64-gpu/pytorch/build/aten/src/ATen/RegisterCUDA.cpp:51977 [kernel]
HIP: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
XLA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
MPS: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
IPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
XPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
HPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
VE: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
Lazy: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
PrivateUse1: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
PrivateUse2: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
PrivateUse3: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
FPGA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
ORT: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
Vulkan: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
Metal: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
Meta: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
QuantizedCPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
QuantizedCUDA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
QuantizedXPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
CustomRNGKeyId: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
MkldnnCPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:690 [kernel]
SparseCPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1858 [kernel]
SparseCUDA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64-gpu/pytorch/build/aten/src/ATen/RegisterSparseCUDA.cpp:2018 [kernel]
SparseHIP: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
SparseXPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
SparseVE: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:29545 [default backend kernel]
SparseCsrCPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1507 [kernel]
SparseCsrCUDA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64-gpu/pytorch/build/aten/src/ATen/RegisterSparseCsrCUDA.cpp:1657 [kernel]
NestedTensorCPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterNestedTensorCPU.cpp:290 [kernel]
NestedTensorCUDA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64-gpu/pytorch/build/aten/src/ATen/RegisterNestedTensorCUDA.cpp:334 [kernel]
BackendSelect: fallthrough registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/build/aten/src/ATen/RegisterZeroTensor.cpp:198 [kernel]
ADInplaceOrView: fallthrough registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradCPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradCUDA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradXLA: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradMPS: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradIPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradXPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradHPU: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradLazy: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradPrivateUse1: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradPrivateUse2: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
AutogradPrivateUse3: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:11935 [autograd kernel]
Tracer: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:13506 [kernel]
AutocastCPU: fallthrough registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1068 [kernel]
VmapMode: fallthrough registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

Exception raised from reportError at /__w/javacpp-presets/javacpp-presets/pytorch/cppbuild/linux-x86_64/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:440 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f29c8081477 in /home/lzm/.javacpp/cache/pytorch-1.12.1-1.5.8-linux-x86_64-gpu.jar/org/bytedeco/pytorch/linux-x86_64-gpu/libc10.so)
frame #1: <unknown function> + 0xa7947a (0x7f285ace747a in /home/lzm/.javacpp/cache/pytorch-1.12.1-1.5.8-linux-x86_64.jar/org/bytedeco/pytorch/linux-x86_64/libtorch_cpu.so)
frame #2: <unknown function> + 0x19ce8b3 (0x7f285bc3c8b3 in /home/lzm/.javacpp/cache/pytorch-1.12.1-1.5.8-linux-x86_64.jar/org/bytedeco/pytorch/linux-x86_64/libtorch_cpu.so)
frame #3: at::_ops::mul_Tensor::call(at::Tensor const&, at::Tensor const&) + 0x87 (0x7f285bbdb2e7 in /home/lzm/.javacpp/cache/pytorch-1.12.1-1.5.8-linux-x86_64.jar/org/bytedeco/pytorch/linux-x86_64/libtorch_cpu.so)
frame #4: Java_org_bytedeco_pytorch_Tensor_mul__Lorg_bytedeco_pytorch_Tensor_2 + 0xa6 (0x7f2848e17bb6 in /home/lzm/.javacpp/cache/pytorch-1.12.1-1.5.8-linux-x86_64-gpu.jar/org/bytedeco/pytorch/linux-x86_64-gpu/libjnitorch.so)
frame #5: [0x7f2a0476a497]

        at org.bytedeco.pytorch.Tensor.mul(Native Method)

Java thread heap info shows the line Tensor new_hidden_state = u.mul(hidden_state).add(c.mul(tmp)); crashes.

So, what's wrong with it?Thanks!

bytedeco / javacpp-presets

pytorch tensor mul api bug? #1271