francescojalvo commented 4 years ago

🐞Describe the bug

Numerical difference in output of bidirectional LSTM from pytorch
Don't know if it's a converter issue or not

Trace

No

To Reproduce

Adapted LSTM test case from coremltools that reproduce the issue

import torch
import torch.nn as nn
import numpy as np
from six import string_types as _string_types

from coremltools import TensorType
from coremltools.converters.mil.testing_reqs import _converter
from coremltools.models import MLModel
from coremltools._deps import _IS_MACOS

def flatten_and_detach_torch_results(torch_results):
    if isinstance(torch_results, (list, tuple)):
        return [x.detach().numpy() for x in _flatten(torch_results)]
    # Do not need to flatten
    return [torch_results.detach().numpy()]

def _flatten(object):
    flattened_list = []
    for item in object:
        if isinstance(item, (list, tuple)):
            flattened_list.extend(_flatten(item))
        else:
            flattened_list.append(item)
    return flattened_list

def generate_input_data(input_size):
    if isinstance(input_size, list):
        return [torch.rand(_size) for _size in input_size]
    else:
        return torch.rand(input_size)

def trace_model(model, input_data):
    model.eval()
    if isinstance(input_data, list):
        input_data = tuple(input_data)
    torch_model = torch.jit.trace(model, input_data)
    return torch_model

def convert_to_coreml_inputs(input_description, inputs):
    """Convenience function to combine a CoreML model's input description and
    set of raw inputs into the format expected by the model's predict function.
    """
    flattened_inputs = _flatten(inputs)
    coreml_inputs = {
        str(x): inp.numpy() for x, inp in zip(input_description, flattened_inputs)
    }
    return coreml_inputs

def convert_to_mlmodel(model_spec, tensor_inputs, backend="nn_proto"):
    def _convert_to_inputtype(inputs):
        if isinstance(inputs, list):
            return [_convert_to_inputtype(x) for x in inputs]
        elif isinstance(inputs, tuple):
            return tuple([_convert_to_inputtype(x) for x in inputs])
        elif isinstance(inputs, torch.Tensor):
            return TensorType(shape=inputs.shape)
        else:
            raise ValueError(
                "Unable to parse type {} into InputType.".format(type(inputs))
            )

    inputs = list(_convert_to_inputtype(tensor_inputs))
    proto = _converter._convert(model_spec, inputs=inputs, convert_to=backend, convert_from="torch")
    return MLModel(proto, useCPUOnly=True)

def convert_and_compare(input_data, model_spec, expected_results=None, atol=1e-5, backend="nn_proto"):
    """
        If expected results is not set, it will by default
        be set to the flattened output of the torch model.
    """
    if isinstance(model_spec, _string_types):
        torch_model = torch.jit.load(model_spec)
    else:
        torch_model = model_spec

    if not isinstance(input_data, (list, tuple)):
        input_data = [input_data]

    if not expected_results:
        expected_results = torch_model(*input_data)
    expected_results = flatten_and_detach_torch_results(expected_results)
    mlmodel = convert_to_mlmodel(model_spec, input_data, backend=backend)
    coreml_inputs = convert_to_coreml_inputs(mlmodel.input_description, input_data)
    if _IS_MACOS:
        coreml_results = mlmodel.predict(coreml_inputs)
        sorted_coreml_results = [
            coreml_results[key] for key in sorted(coreml_results.keys())
        ]

        for torch_result, coreml_result in zip(expected_results, sorted_coreml_results):
            np.testing.assert_equal(coreml_result.shape, torch_result.shape)
            np.testing.assert_allclose(coreml_result, torch_result, atol=atol)

def run_compare_torch(
    input_data, model, expected_results=None, places=5, input_as_shape=True, backend="nn_proto"
):
    model.eval()
    if input_as_shape:
        input_data = generate_input_data(input_data)
    model_spec = trace_model(model, input_data)
    convert_and_compare(
        input_data, model_spec, expected_results=expected_results, atol=10.0 ** -places, backend=backend
    )

def _pytorch_hidden_to_coreml(x):
    f, b = torch.split(x, [1] * x.shape[0], dim=0)
    x = torch.cat((f, b), dim=2)
    return x

def test_lstm(
        input_size,
        hidden_size,
        num_layers,
        bias,
        batch_first,
        dropout,
        bidirectional,
        backend,
):
    model = nn.LSTM(
        input_size=input_size,
        hidden_size=hidden_size,
        num_layers=num_layers,
        bias=bias,
        batch_first=batch_first,
        dropout=dropout,
        bidirectional=bidirectional,
    )
    SEQUENCE_LENGTH = 1
    BATCH_SIZE = 1

    num_directions = int(bidirectional) + 1

    # (seq_len, batch, input_size)
    if batch_first:
        _input = torch.rand(BATCH_SIZE, SEQUENCE_LENGTH, input_size)
    else:
        _input = torch.randn(SEQUENCE_LENGTH, BATCH_SIZE, input_size)

    h0 = torch.randn(num_layers * num_directions, BATCH_SIZE, hidden_size)
    c0 = torch.randn(num_layers * num_directions, BATCH_SIZE, hidden_size)

    inputs = (_input, (h0, c0))
    expected_results = model(*inputs)
    # Need to do some output reshaping if bidirectional
    if bidirectional:
        ex_hn = _pytorch_hidden_to_coreml(expected_results[1][0])
        ex_cn = _pytorch_hidden_to_coreml(expected_results[1][1])
        expected_results = (expected_results[0], (ex_hn, ex_cn))
    run_compare_torch(inputs, model, expected_results, input_as_shape=False, backend=backend)

test_lstm(2, 128, 1, True, False, 0, True, 'nn_proto')

Results:

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=1e-05

Mismatch: 50%
Max absolute difference: 0.9107815
Max relative difference: 36.60195
 x: array([[[ 0.159638, -0.153935,  0.07789 ,  0.005968,  0.144887,
         -0.186706,  0.484189, -0.341685, -0.005737, -0.285721,
          0.446474,  0.461001, -0.303098, -0.127876,  0.119229,...
 y: array([[[ 0.159638, -0.153935,  0.07789 ,  0.005968,  0.144887,
         -0.186706,  0.484189, -0.341685, -0.005737, -0.285721,
          0.446474,  0.461001, -0.303098, -0.127876,  0.119229,...

System environment (please complete the following information):

coremltools version (e.g., 3.0b5): 4.0b1 & 4.0b2
OS (e.g., MacOS, Linux): MacOS
macOS version (if applicable): 10.15.5
XCode version (if applicable): n/a
How you install python (anaconda, virtualenv, system): Anaconda
python version (e.g. 3.7): 3.6.9
any other relevant information:

Additional context

leovinus2001 commented 4 years ago

Setting bidirectional to False i.e. uni directional makes conversion work. I believe this is your clue.

Mismatch: 50%

Seems to me this is not a converter issue but a PyTorch setup issue. There is a mismatch in shape of initial_h0 (2,1,128) with the resulting ex_hn (1, 1, 256). Just print their shapes. I do not remember from the top of my head how to present h0/c0 for a bidirectional network but the pytorch documentation should have an example. h0 = torch.randn(num_layers num_directions, BATCH_SIZE, hidden_size) or h0 = torch.randn(num_layers , BATCH_SIZE, hidden_size num_directions) What do you think?

francescojalvo commented 4 years ago

We had the same issue in our codebase and at the end we found this is a reproducible case from coremltools unit tests. This is why I'm really puzzled about why this passes via coremltools pipelines. Is it possible that pipelines is not running on MacOS and the output validation is not being executed? Unidirectional LSTM works.

francescojalvo commented 4 years ago

About the sizes, there is specific code to reshape the results from bidirectional LSTM in Torch to make it compatible with CoreML. It should be around there btw because the assert always fails by 50%... The first part of the output (128) is ok an the rest is different

leovinus2001 commented 4 years ago

Mhh. How about you use torch.manual_seed(0) # deterministic output and reduce the hidden_size to 2 and just print out the values of the h0/c0 and ex_hn/ex_cn for clues?

leovinus2001 commented 4 years ago

Out of curiosity, I was digging a bit deeper. Issue still unsolved but more intriguing and with a little extra insight I hope that helps. Hope you do not mind.

Attached the more verbose version of your code snippet that I used to produce "result.txt", also attached. modified.py > result.txt modified.txt result.txt

and summary.txt with the analysis. summary.txt

In a nut shell, summary from summary.txt

When you do NOT pass h0/c0 (i.e. zero input for h0/c0) then the output 'y' of the BLSTM looks ok. Both forward and backward parts i.e. 100%
Look at the log /output below especially the annotated parts with #####
Even better, h_n[0] is in the position of h_n[1] i.e. there is an indexing error somewhere in this code or the converter
When you DO pass in h0/c0 explicit (your original case) then the backward parts of the result breaks i.e. 50% correct
In modified.py use the H0C0InputsFlag to switch
I do not think the "if bidirectional:" formatting is contributing anything

Thoughts?

francescojalvo commented 4 years ago

Thanks for the insights ... I think all the issues come from an indexing error. Not sure if it's in this code or in the converter. I'm preparing a simpler test case to verify this.

francescojalvo commented 4 years ago

This is the simplest test case I could think of:

import torch
import torch.nn as nn
import numpy as np

from coremltools import TensorType
from coremltools import convert

def test_lstm(
        input_size,
        hidden_size,
        bidirectional
):
    model = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=1,
            bias=True,
            batch_first=False,
            dropout=0.0,
            bidirectional=bidirectional
        )

    SEQUENCE_LENGTH = 1
    BATCH_SIZE = 1

    input = torch.rand(SEQUENCE_LENGTH, BATCH_SIZE, input_size)

    model.eval()

    torch_model = torch.jit.trace(model, input)

    mlmodel = convert(torch_model, inputs=[TensorType(name="x", shape=input.shape)])

    torch_results = torch_model(input)
    coreml_results = mlmodel.predict({"x": input.numpy()}, useCPUOnly=True)

    torch_output = torch_results[0].detach().numpy().squeeze()
    coreml_output = coreml_results[sorted(coreml_results.keys())[0]].squeeze()

    print("Input:")
    print(input)
    print("Torch output:")
    print(torch_output)
    print("CoreML output:")
    print(coreml_output)

    # Testing last hidden state and last cell state
    torch_hn = torch_results[1][0].detach().numpy().squeeze().flatten()
    torch_cn = torch_results[1][1].detach().numpy().squeeze().flatten()
    coreml_hn = coreml_results[sorted(coreml_results.keys())[1]].squeeze()
    coreml_cn = coreml_results[sorted(coreml_results.keys())[2]].squeeze()

    print("Torch hn:")
    print(torch_hn)
    print("CoreML hn:")
    print(coreml_hn)

    print("Torch cn:")
    print(torch_cn)
    print("CoreML cn:")
    print(coreml_cn)

    # Testing output (differentes if atol <= 1e-5)
    np.testing.assert_allclose(torch_output, coreml_output, atol=1e-4)

    # Testing last hidden state and last cell state (differents)
    np.testing.assert_allclose(torch_hn, coreml_hn, atol=1e-4)
    np.testing.assert_allclose(torch_cn, coreml_cn, atol=1e-4)

test_lstm(2, 2, True)

As you can see, the output of the LSTM is almost equal (atol=1e-4) but the last hidden state and last cell state is completely different, seems that coreml only returns the states of the backward pass. Maybe there is an indexing issue, but as you can see, the code is fairly simple, couldn't find the error.

Unidirectional LSTM, works perfectly: test_lstm(2, 2, False)

leovinus2001 commented 4 years ago

TL;DR I suspect the converter ObjC/C++ code makes a mistake with reformats of the h_n/c_n before you see the predict result in Python, but I cannot confirm it atm.

That is a nice, concise testcase, appreciated!

Here more analysis in detail such that someone can help nailing this down.

1) Attached a more verbose version that I used to get new insights.

testBLSTM.jul30.txt

The full log file is in

testBLSTM.jul30.output.txt

NOTE the WARNING

WARNING:root:Tuple detected at graph output. This will be flattened in the converted model.

and I do not know whether that explains the bug, or not?

2) Using the testcase, we see the following output of the bidirectional LSTM. I limited the output precision to .4 to emphasize the result :)

Input, shape:torch.Size([1, 1, 2]):tensor([[[0.7497, 0.6047]]]) Torch output, shape= (4,) :[-0.1497 -0.0001 0.053 -0.0807] CoreML output, shape= (4,) :[-0.1497 -0.0001 0.053 -0.0807] Torch hn, shape= (4,) :[-0.1497 -0.0001 0.053 -0.0807] # <-------------- CoreML hn, shape= (4,) :[ 0.053 -0.0807 0. 0. ] # <-------------- Torch cn , shape= (4,) :[-0.3136 -0.0002 0.1331 -0.302 ] # <-------------- CoreML cn , shape= (4,) :[ 0.1331 -0.302 0. 0. ] # <--------------

Now notice that CoreML hn, shape= (4,) :[ 0.053 -0.0807 .... ] is IDENTICAL to the last 2 values of "Torch hn", Torch hn, shape= (4,) :[ ..... 0.053 -0.0807]
CoreML cn , shape= (4,) :[ 0.1331 -0.302 .... ] is IDENTICAL to the last 2 values of "Torch cn". Torch cn , shape= (4,) :[..... 0.1331 -0.302 ]

That is no coincidence. Someone forget (or added) a size factor 2 (for bidir) when producing the final tensor/array. In other words, it seems like internally in CoreML(converter), the predict result might be ok, but we as end users only see a subslice as CoreML hn/cn which are produced in the "wrong" way?

The question is of course - can that be fixed (yes) and where?

While I produced a debug version of libcoremlpython.so, and checked with lldb breakpoints on LSTM handling, I have not pinned it down. Anyone has better luck?

In even more detail, backtrace for
coreml_results = mlmodel.predict({"x": input.numpy()}, useCPUOnly=True) is roughly

We call

a) coremltools//models/model.py L302 def predict(self, data, useCPUOnly=False, **kwargs):

which takes the proxy route 'if' branch

b) ./coremlpython/CoreMLPython.mm PYBIND11PLUGIN(libcoremlpython) { py::module m("libcoremlpython", "CoreML.Framework Python bindings"); py::class(m, "_MLModelProxy") .def(py::init<const std::string&, bool>()) .def("predict", &Model::predict) # <------

c) ./coremlpython/CoreMLPython.h public: class Model { py::dict predict(const py::dict& input, bool useCPUOnly);

And then the trail goes stale as I do not see the ObjC function [ predictionFromFeatures ]. I know that is produced when you compile a COreML model in Xcode and we probably do something similar here but I cannot 'step' into it.

Also, I could not confirm the 'shape' of the final predict output via lldb. Some symbols missing

3) Based on the torch code and graph produced , we see in the log file testBLSTM.jul30.output.txt

code: hx = torch.zeros([2, int(max_batch_size), 2], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False) _8, _9, _10 = torch.lstm(input, [hx, hx], [_7, _6, _5, _4, _3, _2, _1, _0], True, 1, 0., False, True, False) return (_8, (_9, _10))

seems correct, and Torch uses twice hx to feed in the zero h0/c0, whatever

graph:

%67 : Float(1, 1, 4), %68 : Float(2, 1, 2), %69 : Float(2, 1, 2) = aten::lstm(%input, %59, %60, %61, %62, %63, %64, %65, %66) # /Users/hans-web/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/rnn.py:570:0 %70 : (Float(2, 1, 2), Float(2, 1, 2)) = prim::TupleConstruct(%68, %69) %71 : (Float(1, 1, 4), (Float(2, 1, 2), Float(2, 1, 2))) = prim::TupleConstruct(%67, %70)

While that looks reasonable, I am not sure whether the order or 67/68/69 in the output is significant.

4) Also, look at the Torch Python code (not coremltools converter) torch/nn/modules/rnn.py

has a comment in 1.51. and 1.6.0. Does that ring any bells?

XXX: LSTM and GRU implementation is different from RNNBase, this is because:

1. we want to support nn.LSTM and nn.GRU in TorchScript and TorchScript in

its current state could not support the python Union Type or Any Type

2. TorchScript static typing does not allow a Function or Callable type in

Dict values, so we have to separately call _VF instead of using _rnn_impls

3. This is temporary only and in the transition state that we want to make it

on time for the release

#

More discussion details in https://github.com/pytorch/pytorch/pull/23266

#

TODO: remove the overriding implementations for LSTM and GRU when TorchScript

support expressing these two modules generally.

class LSTM(RNNBase):

leovinus2001 commented 4 years ago

The story continues based on the previous message:

In even more detail, backtrace for
coreml_results = mlmodel.predict({"x": input.numpy()}, useCPUOnly=True) is roughly

We call

a) coremltools//models/model.py L302 def predict(self, data, useCPUOnly=False, **kwargs):

which takes the proxy route 'if' branch

b) ./coremlpython/CoreMLPython.mm PYBIND11PLUGIN(libcoremlpython) { py::module m("libcoremlpython", "CoreML.Framework Python bindings"); py::class(m, "_MLModelProxy") .def(py::init<const std::string&, bool>()) .def("predict", &Model::predict) # <------

c) ./coremlpython/CoreMLPython.h public: class Model { py::dict predict(const py::dict& input, bool useCPUOnly);

We now look at the ObjC / C++ code in ./coremlpython/CoreMLPython.mm

Specifically, roughly line 72

py::dict Model::predict(const py::dict& input, bool useCPUOnly) { @autoreleasepool { NSError error = nil; MLDictionaryFeatureProvider inFeatures = Utils::dictToFeatures(input, &error); Utils::handleError(error); MLPredictionOptions *options = [[MLPredictionOptions alloc] init]; options.usesCPUOnly = useCPUOnly;

    id<MLFeatureProvider> outFeatures = [m_model predictionFromFeatures:static_cast<MLDictionaryFeatureProvider * _Nonnull>(inFeatures)
                                                                options:options
                                                                  error:&error];

I was wondering whether the error is in the converter code (here), or already wrong when we get it back from [m_model predictionFromFeatures] in the 'outFeatures'.

The answer is the latter. With a bit of C++ surgery, we can get a handle on

MLMultiArray h_n = outFeatures -> "28"); MLMultiArray c_n = outFeatures -> "29");

and ask for shape of the buffers (all three [1,1,4] ) and value of those buffers. I see

h_n[0] = 0.052964 h_n[1] = -0.080748 h_n[2] = 0.000000 h_n[3] = 0.000000

c_n[0] = 0.133108 c_n[1] = -0.302025 c_n[2] = 0.000000 c_n[3] = 0.000000

At least on my machine, the values from h_n and c_n are wrong (but identical to the Python output in earlier comments) when we get them from CoreML Bidirectional model.

leovinus2001 commented 4 years ago

Let's summarize what this bug seems to mean for Bidirectional LSTMs in CoreML and your Apps. Is this about correct?

1) y = blstm(x) without input h_0/c_0 nor use of output c_n/h_n works fine, and delivers the the correct result 'y' like in PyTorch This was ok in the previous version (last year) or CoreML(tools) as well. Any one-shot recognizer with BLSTMs will use this mode, ok.

2) y, (c_n, h_n) = blstm(x) Same as (1) but we look at the returned c_n/h_n, which are both wrong. (this testcase, testBLSTM.jul30.py https://github.com/apple/coremltools/issues/824#issuecomment-667196355, or derived from the really nice testcase https://github.com/apple/coremltools/issues/824#issuecomment-666202297 ) No idea whether that worked last year or not.

3) y, (c_n, h_n) = blstm(x, (zero, zero)) WITH input h_0/c_0 , and all of returned y, cn, h_n are wrong (testcase in first message, this thread, https://github.com/apple/coremltools/issues/824#issue-667198642)

4) y, (c_n, h_n) = blstm(x, (c_0, h_0)) like (3) and all of return values y, cn, h_n are wrong again (testcase in first message, this thread, https://github.com/apple/coremltools/issues/824#issue-667198642)

In other words, ANY use of the BLSTM input value c_0/h_0 or output c_n/h_n seems to lead to bugs and wrong values. When you provide input h_0/c_0 then output 'y' is wrong as well as the backward pass of LSTM is affected.

Or, if there are streaming application with BLSTMs, which is typically not the case, then all of them affected.

Finally, as we presume that TF converted models are directed to the same CoreML code they might be affected in the same way. Haven't tried but that is what it looks like.

It would be nice to have a fix for such an issue.

Anyway, moving on.

francescojalvo commented 4 years ago

Thanks @leovinus2001. I'll let this opened until someone from CoreML can have a look.

leovinus2001 commented 4 years ago

PS: This issue sounds an awful lot like issue #263 which is two years old, ugh. In #263, the suggested workaround is ok but this mangling of the hn/cn outputs from BLSTMs also affects your TestLSTM testcase and the suggested testcase in the PR #843 as far as I can see.

leovinus2001 commented 4 years ago

ping

srikris commented 4 years ago

@leovinus2001 Marked as a P1. We will take a look at this before we promote 4.0 out of beta. Thanks for your patience.

apple / coremltools

Bidirectional LSTM conversion issue from PyTorch to CoreML #824

🐞Describe the bug

Trace

To Reproduce

System environment (please complete the following information):

Additional context

XXX: LSTM and GRU implementation is different from RNNBase, this is because:

1. we want to support nn.LSTM and nn.GRU in TorchScript and TorchScript in

its current state could not support the python Union Type or Any Type

2. TorchScript static typing does not allow a Function or Callable type in

Dict values, so we have to separately call _VF instead of using _rnn_impls

3. This is temporary only and in the transition state that we want to make it

on time for the release

More discussion details in https://github.com/pytorch/pytorch/pull/23266

TODO: remove the overriding implementations for LSTM and GRU when TorchScript

support expressing these two modules generally.