QIICR / dcmqi

dcmqi (DICOM for Quantitative Imaging) is a free, open source C++ library for conversion between imaging research formats and the standard DICOM representation for image analysis results
https://qiicr.gitbook.io/dcmqi-guide/
BSD 3-Clause "New" or "Revised" License
232 stars 62 forks source link

BUG: serialize floats with maximum precision #416

Closed kislinsk closed 3 years ago

kislinsk commented 3 years ago

API-breaking change: Helper::floatToStrScientific() was renamed to Helper::floatToStr().

Signed-off-by: Stefan Dinkelacker s.dinkelacker@dkfz-heidelberg.de

kislinsk commented 3 years ago

The CircleCI GCC build fails because it explicitly uses -std=c++98. It that intentional? std::numeric_limits<T>::max_digits10 requires C++11. If C++98 compatibility is a requirement of DCMQI, the constant can be replaced by the literal constant value 9 as documented.

fedorov commented 3 years ago

@kislinsk do you have concerns about using the approach with scientific notation as in #415 ? That approach will always result in the known maximum length of the string (assuming exponent width does not exceed 3 digits).

The issue I have with the approach in this PR is that I do not understand the expected behavior. As you can see from the CI testing results, precision is specified to be 9, and from the discussions with @rfloca this should be the total length of the string, but in practice it is not.

image

I am also not sure why you refer to this approach as "maximum precision".

kislinsk commented 3 years ago

Wasn't the scientific notation problematic in the fist place? I'm not sure but even the first proposed changes didn't work out in that case. I didn't dive deeper into scientific notation but figured out that there's maximum possible precision for float to string conversion and that it may even show up slightly different values as string but when read back in produce the exact same float before serialization. The standard library specifically introduced the limits digits10 constant for serialization purposes. Long story short, all I can say is that my approach should be guaranteed to work according to both the IEEE yada yada floating point standard and the standard library without relying on any obscure standard details found in foot notes. :-)

I then didn't go back to scientific since the actual purpose of that function was to generate precise serializable float strings. DICOM is able to use 16 characters and the precision is 9 but there are extra characters like the decimal point and a minus that you have to take into account, so that's why there is sometimes 11 characters in total.

fedorov commented 3 years ago

Wasn't the scientific notation problematic in the fist place?

Problematic in the first place was the use of the default precision value for scientific, which was less than could have been afforded with the 16 character limit.

the precision is 9 but there are extra characters like the decimal point and a minus that you have to take into account, so that's why there is sometimes 11 characters in total.

If you take a look at this CI run: https://ci.appveyor.com/project/fedorov/dcmqi/builds/37577819, which corresponds to the prescribed precision of 14, the total width of the string can go all the way up to 18 for numbers that do not even have minus. Can you explain this?

image

But, precision 9 does not cause CI failures, is preferred by MITK, and you guys are the C++ experts, so I am fine going with your advice. I am going to remove the print statement, and merge.

kislinsk commented 3 years ago

If you take a look at this CI run: https://ci.appveyor.com/project/fedorov/dcmqi/builds/37577819, which corresponds to the prescribed precision of 14, the total width of the string can go all the way up to 18 for numbers that do not even have minus. Can you explain this?

AFAIK 17 happens to be the digits10 constant for double (plus one for the decimal point = 18 in the screenshot). I can imagine that this is an artifact of an implementation detail of the conversion that may internally uses double which in case of conversions from float then potentially contain artificial additional decimal numbers. Something like that, sorry, cannot give a definite answer but as long as we stay in the explicitly specified precision range of the standards I'm pretty sure that we're fine. :)

Thank you!