Image normalization causing test_model test failure

veegalinova commented 1 year ago

Hello! I ran into the issue while trying to run the test_model function on the exported model.

I want to discuss if the following behavior is a conscious choice or more of an oversight.

Sorry in advance for the long explanation 😄

I am trying to export and test the model from an external library (N2V). I have a sample input image with a large mean and std. I normalize it, run the model and then denormalize the model output. The processing portion of my model rdf looks like this:

inputs:
- axes: byxc
  data_range: [-.inf, .inf]
  data_type: float32
  name: input0
  preprocessing:
  - kwargs:
      axes: yx
      mean: [32473.281]
      mode: fixed
      std: [18754.428]
    name: zero_mean_unit_variance
  shape: [1, 64, 64, 1]
outputs:
- axes: byxc
  data_range: [-.inf, .inf]
  data_type: float32
  name: output0
  postprocessing:
  - kwargs:
      axes: yx
      gain: [18754.428]
      offset: [32473.281]
    name: scale_linear
  shape: [1, 64, 64, 1]

After running test_model function on the exported model, the last test fails with the following error:

{
  'name': 'reproduce test outputs from test inputs (bioimageio.core 0.5.9)',
  'status': 'failed',
  'error': 'Output and expected output disagree:\n \nArrays are not almost equal to 4 decimals\n\nMismatched elements: 416 [/](https://vscode-remote+ssh-002dremote-002bvdi.vscode-resource.vscode-cdn.net/) 4096 (10.2%)\nMax absolute difference: 0.00048828\nMax relative difference: 2.5800531e-07\n x: array([[[[3405.0159],\n         [3877.9998],\n         [1644.0879],...\n y: array([[[[3405.0159],\n         [3877.9995],\n         [1644.0879],...',
  'traceback': None
}

After some investigation, I believe I was able to localize the issue - there is a slight difference between how the image is being handled in zero_mean_unit_variance vs my code.

When mean and std values are validated from the model rdf, it is done essentially in this way: mean = np.array(float(mean_string)), which yields a float64 value. After running zero_mean_unit_variance with this value on a float32 input, it becomes float64.

The problem with this behavior is that in my code, and probably many other users' code, the input value will stay the same type - float32 from the beginning to the end of the inference pipeline (although images with this large of the mean are probably rarer) If the mean and std values are large the resulting difference after normalization and denormalization between the output of my pipeline and bioimage core becomes big enough to fail the test.

So potential solutions could be one of the:

ensuring that mean and std values are fixed to the same dtype as the data_type in the model rdf in the bioimage core
making it clear somewhere in the documentation that the user should be careful with how he handles normalization in his own code

Here is the sample code to reproduce the issue. If you pass input and output tensors as type np.float32 the test will fail. If you uncomment the conversion of input and output to np.float64 the test will pass.

import os 

import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

from bioimageio.core.build_spec.build_model import build_model
from bioimageio.core import load_resource_description
from bioimageio.core.resource_tests import test_model

os.makedirs("my-model", exist_ok=True)

model = keras.Sequential([
    layers.Conv2D(1, kernel_size=1, activation=None, input_shape=(64, 64, 1), padding='same')
])
model.save('my-model/weights.hd5', save_format='h5')

X = np.random.rand(1, 64, 64, 1).astype(np.float32)
# X = np.random.rand(1, 64, 64, 1).astype(np.float64)

# multiplying by 5000 to make X mean and std values large 
X *= 5000 
np.save("my-model/test-input.npy", X)

mean = X.mean().item()
std = X.std().item()
eps = 1e-6
X = (X - mean) / (std + eps)

output = model.predict(X)
# output = output.astype(np.float64)
output = output * std + mean
np.save("my-model/test-output.npy", output)

with open("my-model/doc.md", "w") as f:
    f.write("# My First Model\n")

build_model(
    weight_uri="my-model/weights.hd5",
    weight_type="keras_hdf5",
    test_inputs=["my-model/test-input.npy"],
    test_outputs=["my-model/test-output.npy"],
    input_axes=["byxc"],
    output_axes=["byxc"],
    output_path="my-model/model.zip",
    name="MyFirstModel",
    description="a fancy new model",
    authors=[{"name": "Gizmo"}],
    license="CC-BY-4.0",
    documentation="my-model/doc.md",
    tags=["nucleus-segmentation"],  
    cite=[{"text": "Gizmo et al.", "doi": "doi:10.1002/xyzacab123"}],
    preprocessing=[[
        {
            'kwargs': {
                'axes': 'yx',
                'mean': [mean],
                'mode': 'fixed',
                'std': [std]
            },
            'name': 'zero_mean_unit_variance'
        }
    ]],
    postprocessing=[[
        {
            'kwargs': {
                'axes': 'yx',
                'offset': [mean],
                'gain': [std]
            },
            'name': 'scale_linear'
        }
    ]]
)

my_model = load_resource_description("my-model/model.zip") 
test_model(my_model)

Versions:

bioimageio_spec_version: 0.4.9 bioimageio_core_version: 0.5.9

constantinpape commented 1 year ago

Hi @veegalinova, thanks for the clear description of the issue! I don't have time right now to think about how to cleanly solve this, but just wanted to let you know that we have a workaround: it is possible to set the precision for the test in test_model via the --decimal parameter. Setting this e.g. to 2 (default is 4), will probably fix the test. You can also add something to the config in the rdf so that this is respected by the CI. (I am not quite sure about the details anymore, @FynnBe should know more).

(That being said, it would def. be nice to find a cleaner solution, but in case you want to upload the model now you can use the workaround above.)

jdeschamps commented 1 year ago

Are the --decimal and config (CI-wise) documented somewhere on https://bioimage.io/docs/#/? I couldn't find it there, but also might have missed it.

FynnBe commented 1 year ago

before the normalized data is given to the NN it should be converted to float32 again, see https://github.com/bioimage-io/core-bioimage-io-python/blob/53dfc45cf23351da61e8b22d100d77fb54c540e6/bioimageio/core/prediction_pipeline/_combined_processing.py#L70 (of course this might still be the issue somehow...)

Setting the test precision may be undocumented, here is what you'd have to include for your example:

  config:
    bioimageio:
      test_kwargs:
        keras_hdf5:
          decimal: 2

the --decimal flag is listed when insepcting the help:

$ bioimageio test-model -h
bioimageio.spec 0.4.9
implementing:
        collection RDF 0.2.3
        general RDF 0.2.3
        model RDF 0.4.9
+
bioimageio.spec.partner 0.4.9
implementing:
        partner collection RDF 0.2.3
bioimageio.core 0.5.9

 Usage: bioimageio test-model [OPTIONS] MODEL_RDF

 Test whether the test output(s) of a model can be reproduced.

╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    model_rdf      TEXT  Path or URL to the model resource description file (rdf.yaml) or zipped model. [default: None] [required]                       │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --weight-format           [pytorch_state_dict|torchscript|keras_hdf5|tensorflow_js|tenso  The weight format to use. [default: None]                       │
│                           rflow_saved_model_bundle|onnx]                                                                                                  │
│ --devices                 TEXT                                                            Devices for running the model. [default: None]                  │
│ --decimal                 INTEGER                                                         The test precision. [default: 4]                                │
│ --help,--version  -h                                                                      Show this message and exit.                                     │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

bioimage-io / core-bioimage-io-python

Image normalization causing test_model test failure #342

Versions: