Open nicoloesch opened 1 year ago
Hello
I am curious about the cases where it happen. I work with full brain MRI segmentation and there the probability map are often very large. (ie number of possible patche center << NUM_SAMPLE
but let's imagine I want to focus on a small region with only five (connected) voxels. If I choose large enough patch size it seems to me that event the five possible distinct patches (each center on the five voxel of my cdf) will already be very similar. So taking only five patches will not solve the issue to have almost identical patches ... no ?
may be your proposition makes sense if the five voxel are spatially distinct ... but it looks weird to me to have single voxel regions ...
Hi,
The reason this came up is because I am using torchio for 2D samples, with each slice representing a patch in the classical sense. As a result, I sometimes have less available slices than the patches_per_subj set upfront. I am aware that this case is not usually encountered but it raises then the question: Why have that mechanism in place if it is not checked anyways? Why not remove the entire NUM_SAMPLES attribute alltogether if it is not set at all or at the wrong time? For my application it makes sense to set the attribute but I understand this is not always the case. However, if the check and system is already there, why not use it in the way it was intended. I am happy to utilise my own classes that overwrite this functionality but the question still remains why the mechanism is there but not being used.
I was just questioning the use case, but I now better understand your's so it makes senses
š Feature If a
torchio.Sampler
is used in combination with atorchio.Queue
, the Queue requests theNUM_SAMPLES
attribute of eachtorchio.Subject
in_fill
.However, usually the max number of samples/patches per subject is dependent on the different augmentations performed and subsequently is reflected by the number of non-zero entries in the probability map, which is processed by the cdf to yield patches in
_generate_patches
of the respective sampler. As a result, the number of samples is only known AFTER calculating the probability_map - the current implementation of the Queue however requests the attribute PRIOR to knowing the number of samples. If one would rewrite__call__
(sampler
),_generate_patches
(sampler
) and_get_subject_num_samples
(Queue
) (shown in the following), one could obtain the probability_map prior to creating the generator in_generate_patches
and therefore set thenum_samples
prior to the Queue requesting the attribute.Motivation
Allowing the user to set/ automatically setting the number of samples retrieved from each subject makes the
Queue
more robust, functional, and alleviates the sampling of duplicates (e.g. the probability_map only has 5 allowed patches but the user requested 10 -> each one is sampled approx. twice).Pitch
Rewrite
__call__
oftorchio.Sampler
to the following:Rewrite
_generate_patches
of the samplers to the following (in my example it isweighted.py
sampler but needs to be done accordingly if the method is overwritten in other samplers):And finally adapt the method
_get_subject_num_samples
oftorchio.Queue
to:Alternatives
The highlighted section in
__call__
should be kept in to prevent an endless loop in the case ofnum_patches=None
in_generate_patches
. As an alternative, one could force to have num_patches set to an integer in any case (I can't think of a scenario of endless sampling), i.e. remove theOptional
and test foris not None
.Remarks The layout of the
Queue
might change depending on the outcome of #1096. Furthermore, there needs to be a method if a sample has zero available patches (for whatever reasons). Currently, I ensure that this does not happen inget_probability_map
but the entire thing might break down if a subject has zero patches (not tested by me as of now).