Improve Segmentation Creation Efficiency

Segmentation creation is slow when the number of frames is large. This PR contains several changes to improve this:

Do not append directly to the PerFramesFunctionalGroupsSequence during construction. Surprisingly, this append operation was taking up a significant proportion of the time in each loop iteration. I changed this to append the Dataset to a Python list (rather than a pydicom.Sequence) and then cast the list to a pydicom.Sequence after the loop. The effect was a very dramatic speed up.
There are several unnecessary dtype cast operations in the previous implementation. These are very expensive with large arrays. In this PR they are removed, and the original dtype is retained for longer, with the final cast to uint8 at the end.
There are several checks on the input array, such as that the segment numbers match and that the entire array is not empty, that become very slow when the input array is large. I was able to speed some of these up by using alternative numpy operations, and combining intermediate results for some of the checks.
The previous implementation of _omit_empty_frames actually altered the input array. While this makes things conceptually a bit simpler, it is an unnecessary operation on a potentially very large array. I change this to leave the input array unaltered and changed the later indexing logic to select the correct frame such that empty plane positions are ignored.

In the course of working on this, I discovered two bugs that I fixed:

The previous implementation would not correctly apply the Maximum Fractional Value scaling in some situations. It seems that the tests had been fudged a little to let this through but I'm sure it's wrong. The tests have now been changed to enforce the correct behaviour.
In the previous implementation, it was possible that a user would pass two or more distinct source images with the same plane position (according to the relevant dimension indices). In this situation, the segmentation frames would only be recorded for the first of the frames. In other words the implementation would silently do the wrong thing. While I suppose in principle it may be possible to allow multiple segmentation frames with the same plane position but different SourceImageSequences, I imagine this use case is not common and not a high priority to support correctly. Therefore, in this PR I simply added a check that will raise an exception when this situation is encountered such that the library no longer silently fails as before.

I also factored out several parts of the complicated constructor to make it more readable and modular.

A further change I made was to demote some of the logger messages about frame omission etc from INFO to DEBUG. For large segmentations these messages become overwhelming and in my opinion do not belong at the INFO level.

I tested this new implementation on the large mulit-segment CT from #202 (around 650 frames with 98 segments), and saw a speed up of around 10x in creation time(!). Furthermore, I ensured that each individual change I made improved the efficiency (and not simply that the net effect of all changes improved efficiency).

Note that when creating FRACTIONAL segmentations with a transfer syntax that requires frame compression, the efficiency gains found above are much less significant relative to the time required to compress the frames. I intend to add an option to parallelise frame compression in a future PR.

ImagingDataCommons / highdicom

Improve Segmentation Creation Efficiency #227