QIICR / dcmqi

dcmqi (DICOM for Quantitative Imaging) is a free, open source C++ library for conversion between imaging research formats and the standard DICOM representation for image analysis results
https://qiicr.gitbook.io/dcmqi-guide/
BSD 3-Clause "New" or "Revised" License
232 stars 62 forks source link

DCM Seg: Rounding of orientation matrix when storing seg #414

Closed rfloca closed 3 years ago

rfloca commented 3 years ago

It seems that DCMQI rounds values at the 5th decimal position, if storing DCM Segs.

We had this issues now with some data sets (see https://phabricator.mitk.org/T28127), where the rounding became obvious. E.g. Stored as nrrd: space directions: (0.68359375,0,0) (0,0.68359375,0) (0,0,1) Stored as DCM Seg (via DCMQI): space directions: (0.68359380000000003,0,0) (0,0.68359380000000003,0) (0,0,1)

Before we just lower the precision of consistency checks on our side, we thought that it might make sense to check that DCMQI provides the highest possible fidelity when storing the data. That's why we wanted to point out that finding here and discuss it.

What could/is the reason for that rounding? Any idea?

fedorov commented 3 years ago

@rfloca thanks for bringing this up! I believe the issue is due to the default value of 6 for sstream.precision(). While writing SEG, float are converted into str scientific notation, and I did not think to change (or, in fact, to even check!) the default precision value. Do you have any recommendation what value is sensible? Should it be more than 8?

rfloca commented 3 years ago

Do you have any recommendation what value is sensible? Should it be more than 8? @fedorov That is a very good question. And we also had a lot of internal discussion about that in the past numerous times. We currently have a precision check epsilon of 1e-6. But in the end it is the product of heuristics and experience and not of some hard proof or so. For us it was a pareto optimum (as strict as possible, as soft as necessary) between pragmatism (because numerous codes out there do rounding and this was an epsilon that was ok with all data sets we came in touch so far) and precision (the error seemed neglectable/acceptable for normal medical imaging use cases; spacing differences of 0,000001 mm and orientation errors of 0,00005°). Hope that helps. Regarding your question. I think sstream.precision(8) is therefore (see above) OK, if you combine it with std::fixed to ensure that its about the decimal digits. Otherwise you could also run into problems with e.g. extreme origin world coordinates where you have multiple digits before the decimal point. Another option, if you don’t use fixed (to avoid trailing 0), is to at least use a precession of e.g. 12 to be on the save side. (dicom’s Decimal String allows up to 16 bytes).

pieper commented 3 years ago

@fedorov can you use the same library as ITK? https://vtk.org/pipermail/vtk-developers/2018-March/035983.html

fedorov commented 3 years ago

Thanks Ralf and Steve. Steve - I will look into using that library.

fedorov commented 3 years ago

I looked into itkNumberToString, and I did not see a way to restrict the length of the resulting number (and for DICOM we need to fit in 16 characters).

I am planning to use precision of 9 combined with scientific notation (9 (digits after the dot) + 1 (dot) + 2 (leading decimal number and optional sign) + 4 (exponent)).

rfloca commented 3 years ago

I am planning to use precision of 9 combined with scientific notation (9 (digits after the dot) + 1 (dot) + 2 (leading decimal number and optional sign) + 4 (exponent)). Hi @fedorov , thanks for looking into it! May I pose a question? Why do you use scientific notation for storing into DICOM decimal string. May be I overlook something. Then it would be cool, if you help me out. But thinking about it, I would assume that the most robust and controlable way to convert floats into a legal DICOM decimal string is just: sstream << setprecision(15) << f; This way you are sure that the string is never longer then 16 bytes and that the stream will try to cover it with the highest possible precision. Using scientific notation as you have proposed, you can only guarantee the number of digits after the dot, but not befor the dot. We e.g. have regulary origins that are more than 1 meter away and therefore could result into more then 2 leading digits. With your proposal, this would result in an invalid decimal string as it will have more than 16 bytes. What do you think?

fedorov commented 3 years ago

We e.g. have regulary origins that are more than 1 meter away and therefore could result into more then 2 leading digits. With your proposal, this would result in an invalid decimal string as it will have more than 16 bytes.

@rfloca my understanding is that with scientific notation there is always 1 and only 1 digit before the decimal point, so the invalid scenario you mention about should never happen: http://www.cplusplus.com/reference/ios/scientific/. Am I missing something?

rfloca commented 3 years ago

@rfloca my understanding is that with scientific notation there is always 1 and only 1 digit before the decimal point, so the invalid scenario you mention about should never happen: http://www.cplusplus.com/reference/ios/scientific/. Am I missing something?

Ooops. My bad. Thanks for pointing that out. Totaly forgot about this. Yes you are right with "only 1 digit", so there is no risk for an invalid decimal string. For as it should be enough precision (we need at least 10 digits; so that we cover values representing <10 meters with a decimal precision of at least 6). Don't know if others need more.

fedorov commented 3 years ago

TODO:

rfloca commented 3 years ago
* [ ]  manually form `scientific` format, since (apparently) the width of the exponent is platform specific (as clear from the Appveyor CI failed tests), and the appears to be no API to specify the desired width!

If you realy have to do the scientific notation by hand, I would personaly realy think of skipping the scientific notation for the serialization into DICOM decimal string (at least for all geometry related tags). At leat in the typical value ranges, we are confronted with when storing geometric information, we do not benefit from the extended expressive range that is offered by the scientific notation. Here just having normal decimal notation and set the percision to 15 (meaning that tops 15 digits plus dot will be used; guarented by standard) would suffice, wouldn't it.

But what ever you decide to choose. Thank you very much for taking care of that issue so swiftly. 👍 ❤️

fedorov commented 3 years ago

Here just having normal decimal notation and set the precision to 15 (meaning that tops 15 digits plus dot will be used; guaranteed by standard) would suffice, wouldn't it.

I do not think it would, since this way you have no control over how many digits will be used before the dot. At least that's how I understand precision - it defines the number of digits after the dot only. What scientific notation gives you is the guarantee on the digits before the dot.

Thank you very much for taking care of that issue so swiftly.

:-/ I would definitely not call this "swiftly"... Sorry it takes so long.

rfloca commented 3 years ago

I do not think it would, since this way you have no control over how many digits will be used before the dot.

If I have understood it correctly, the standard guarantees exactly that control in the default locale with just precision.

Quote from https://www.cplusplus.com/reference/ios/ios_base/precision/

For the default locale:

  • Using the default floating-point notation, the precision field specifies the maximum number of meaningful digits to display in total counting both those before and those after the decimal point. Notice that it is not a minimum, and therefore it does not pad the displayed number with trailing zeros if the number can be displayed with less digits than the precision.

So as read it, it should work.

:-/ I would definitely not call this "swiftly"... Sorry it takes so long. Ah, you know, In the land of the blind the one eyed is king. ;) May response times to a lot of tickets are corrently a lot worse,

fedorov commented 3 years ago

Quote from https://www.cplusplus.com/reference/ios/ios_base/precision/

Indeed, my turn to say I missed it! But I have to say I do not understand what is the meaning of "default locale" ("default notation, which is not necessarily equivalent to either fixed nor scientific" - doesn't it mean that the default behavior is undefined?), and the actual behavior then does not match what the documentation says - unsetting floatfield results in precision controlling the width of the characters after the point (on my mac):

cout.unsetf(std::ios_base::floatfield);
cout << "Default precision 8: " << setprecision(8) << 1./3. << endl;
cout << "Scientific precision 8: " << scientific << setprecision(8) << 1./3. << endl;
cout << "Fixed precision 8: " << fixed << setprecision(8) << 1./3. << endl;

Result:

Default precision 8: 0.33333333
Scientific precision 8: 3.33333333e-01
Fixed precision 8: 0.33333333

I switched to fixed notation, and added checks to truncate the string if it exceeds 16 characters. Let me know if you have any concerns.

https://github.com/QIICR/dcmqi/pull/415/files#diff-084a8a9c78ef1c2485f93151e065eaf54729ec58c6dd5504b80601e69d822473R121-R131

There are MANY open issues in dcmqi, and I failed to keep the tracker under control, but since this issue affects the use of dcmqi in MITK, it is important for me to get it fixed in a way that satisfies you guys.

pieper commented 3 years ago

I do not understand what is the meaning of "default locale"

For this I you want to set the C locale and not use the users locale influence the format of the numbers (or you could get commas instead of periods in your notation).

rfloca commented 3 years ago

@pieper alread covered it. It is the locale named "C" or also std::locale::classic().

to ensure your stringstream uses it you would do sstream.imbue(std::locale::classic()); That should do the trick. See comment in PR.

fedorov commented 3 years ago

Thanks for the feedback guys. I have no idea. But if I follow @rfloca advice and replace fixed with

    sstream.imbue(std::locale::classic());
    sstream << setprecision(15) << f;

I get output strings that exceed 16 characters.

rfloca commented 3 years ago

Hm, that's strange. I have tested the follwoing snippet with gcc, clang and VC and it always generates a string with 16 characters.

#include <iostream>
#include <sstream>
#include <iomanip>
int main()
{
    std::ostringstream sstream;
    sstream.imbue(std::locale::classic());
    sstream << std::setprecision(15) << 10/3.;

    std::cout << "String: " << sstream.str() << std::endl;
    std::cout << "Size: " << sstream.str().size() << std::endl;
}

See also: https://rextester.com/PGB35345

But I must admit, if it realy does not work on your setup, I have, by heart, no clue. So fixed with a precision of somthing between 7-9 would also be ok.

fedorov commented 3 years ago

I really don't know what to make out of it, but this code below reliably fails with dcmqi regression tests, including on the Windows Appveyor

image

https://github.com/QIICR/dcmqi/pull/415/files#diff-084a8a9c78ef1c2485f93151e065eaf54729ec58c6dd5504b80601e69d822473R123-R128

https://ci.appveyor.com/project/fedorov/dcmqi/builds/37577819 image

Switching back to scientific with precision 8.

rfloca commented 3 years ago

Don't mind. I cannot give you a good answer either.

And please for give me for being a pain in the neck. But as statet befor, scientific + precision of 8 is potentially not save. Can we also trigger the CI with a pull request? Then might have a look on that too.

kislinsk commented 3 years ago

I think I figured it out after reading the documentation of std::numeric_limits::max_digits10. I will create a pull request for review.

fedorov commented 3 years ago

resolved by #416