QIICR / dcmqi

dcmqi (DICOM for Quantitative Imaging) is a free, open source C++ library for conversion between imaging research formats and the standard DICOM representation for image analysis results
https://qiicr.gitbook.io/dcmqi-guide/
BSD 3-Clause "New" or "Revised" License
237 stars 62 forks source link

Update seg-schema.json with cardinalities #400

Closed razorx89 closed 4 years ago

razorx89 commented 4 years ago

In https://github.com/razorx89/pydicom-seg/issues/14 the author relied on the schema validation, which succeeded, but in the end produced the wrong result. According to the examples in this repository, e.g. example_multiple_segments_reordered.json, a segment object is encapsulated into an array. Thus the outermost array, the segmentAttributes key, contains arrays, each having a single object. The schema, however, does not explicitly state this structure and thus a user could write the json with having the outermost array containing only a single array, which contains multiple objects.

https://github.com/QIICR/dcmqi/blob/9c0c7bf75bb89c92f5d4aa120c5f6d13404f0fc8/doc/schemas/seg-schema.json#L94-L99

Since the schema is written for jsonschema v4, it is easy to define the required cardinalities: https://json-schema.org/understanding-json-schema/reference/array.html#length

 "segmentAttributes": { 
   "type": "array", 
   "minItems": 1,
   "items": { 
     "type": "array", 
     "minItems": 1,
     "maxItems": 1,
     "items": { 
       "type": "object", 

This change would require the outer array to have 1..* items and the innermost to have exactly 1 item.

fedorov commented 4 years ago

@razorx89 as I tried to explain in https://github.com/razorx89/pydicom-seg/issues/14#issuecomment-631055948, it is NOT the case that the inner array should contain at most 1 item! Each inner array corresponds to a single input segmentation file (of which there can be many passed to the converter as input), and each item in the inner array corresponds to a label within that segmentation file.

There is also an attempt to explain this in the documentation, see https://qiicr.gitbooks.io/dcmqi-guide/opening/cmd_tools/seg/itkimage2segimage.html:

image

razorx89 commented 4 years ago

Yes, thank you for explaining the details. I didn't realize it was intended for the multi-input use-case.

Nevertheless, including the minItems: 1 cardinality would at least raise an error on validation with missing segmentation objects or missing inner arrays. In my opinion, at least one segment has to be defined anyways.

fedorov commented 4 years ago

Yes, I agree with you. I should add this. And thank you for creating the issue! It happens so often that users are confused, but I never learn about this confusion until I meet and talk to them...

razorx89 commented 4 years ago

Regarding this case I think the main confusion comes from the provided doc/examples/seg-*.json.

So the non-overlapping use-case, single file with multiple segments, isn't available in the examples.

fedorov commented 4 years ago

I agree, this is super confusing .... I think the reason those files are structured that way is because that's what dcmqi segimage2itkimage created for the input data used to generate those examples, and we didn't think to revise them to make them more relevant. I will revise those examples to not have one inner list per label.

fedorov commented 4 years ago

@razorx89 can you take a look at #401 and let me know if it addresses your concerns sufficiently?

In a nutshell, I added a new example instead of replacing the existing ones, since they are not just examples, but also input to the regression testing, and the converter will always generate one output file per label. I also added a readme to explain what's going on with those examples.

razorx89 commented 4 years ago

Seems good to me :+1:

Good addition to the documentation, this will definitively help preventing confusion about this array structure.