Enet4 / nifti-rs

Rust implementation of the NIfTI-1 format
Apache License 2.0
40 stars 11 forks source link

Inverse Linear Transform on writing #28

Open nilgoyette opened 5 years ago

nilgoyette commented 5 years ago

The current implementation of slope/inter is too simple and produces wrong results if T is an integer type and slope/inter are not exactly equal to an integer.

let slope = T::from_f32(slope).unwrap();
let inter = T::from_f32(header.scl_inter).unwrap();
for arr_data in data.axis_iter(Axis(0)) {
    write_slice(writer, arr_data.sub(inter).div(slope))?;

For example, a slope of 0.5 on a u16 image would produce a black image (filled with 0).

We should use a concept similar to data::element. See this comment from @Enet4 for more information.

nilgoyette commented 5 years ago

And now we have a bug because of this problem! We tried saving a u16 image with a header from a reference anatomy with a complex signal&inter (SI), like S=0.182 and I = 491.54. It doesn't produce a black image, it crashes with a 'attempt to divide by zero' error.

We discussed at Imeka and we have reached some conclusion. This problem doesn't affect float and rgb images, so lets forget about them. For all integer images (i8, u8, i16, ..., u128), zero out SI because they are always irrelevant. Why?

So we could simply add Element::is_integer() -> bool or something like that. If true, we set S=1.0, I=0.0 and we write the image as we already do. But read the next paragraph before thinking about it :)

Now that we understand that integer images don't need SI (if you agreed ^^), we can apply the same thought process to float image. Why would we (or anyone) want to apply a SI to a float image? It would simply cast it to the same type. The fact that we decide the on-disk datatype from the templated type makes SI totally useless for writing all image types. Of course, it's required and important for reading images.

Enet4 commented 5 years ago

I think all this amounts to one of two choices when writing a file:

  1. We disregard the header's scl_slope and scl_inter, as we already do for datatype, so that saving a volume from an ndarray to a file requires no more conversions than an element-wise cast.
  2. We honour the given scl_slope and scl_inter if they are not zeroed out, potentially using a number type of higher precision in this conversion to prevent precision loss and anomalous arithmetic operations.

To be honest, I was hoping to see the latter happen when I posted that comment at #26. The reasoning be that:

Afaik, the point of SI is to save disk space. It's not my place to say if it's a good idea or not, but I know that we won't save any disk space by a "better" SI management because we are simply substracting and dividing numbers from a type to the same type.

I would call this a bug, rather than an issue with SI management. In practice, we would have 3 data types in this conversion: (1) the original data from the ndarray or in-mem volume T; (2) the requested file data type specified in code by the type parameter U; and potentially what we're missing here, (3) an intermediate element type to ensure that the inverse affine transformation doesn't blow up. :) The way I see it, we can cast from an integer type T to f32, inversely scale the values, and then convert to U, thus avoiding the bad arithmetic operation.

However, considering that making this work is not very important right now, and that the current state of the library is a bit problematic, it's best that we impose the former approach to fix the "divide by zero" error, and perhaps support scaling at a later date. Can you write a PR for this? All I ask is that this behaviour is well document in the writer functions.

nilgoyette commented 5 years ago

Of course, everything I wrote was taking into account that T == U, thus making SI useless. We don't plan to use any conversion feature, as our data is already in the right type when writing the image. I understand that this could be an interesting feature for other people though. But where does U come from, in your story?

From the reference header, like NiBabel? If so, I don't agree that it's "requested". In our case, we use a reference image only to ensure that we are in the right space (it's mostly an affine cloner), it's never to request a datatype. In fact, I (personally) always hated NiBabel behavior.

Or directly from the user when calling write, like write_nifti(path, &data, Datatype::U16, Some(header));, or by adding a new function, like write_nifti_with_datatype with a new datatype parameter or something else?

Enet4 commented 5 years ago

Oh, somehow I had the impression that our write functions had a type parameter for the target data type. But never mind, T == U makes sense in our "write from ndarray" functions. This would probably be different if we were to save volumes from the base in-mem volume API, in which the original data type is only known at run time. In this case it would make sense to specify a target data type in some way.

To sum up: let's override scl_slope and scl_inter for no scaling, for the time being. It ought to be a safe choice, although supporting this in the future will result in a breaking change once users start actively relying on the library overriding these fields.

nilgoyette commented 5 years ago

Perfect. Thanks for the discussion. I'll write a PR, probably next week.