Unexpected behaviour of weighted sampler

fepegar commented 5 years ago

Let my sampling map be sampling_map = [1, 1, 2, 2, 1, 1], where 2 is the tumour segmentation ground truth and 1 is everything else in my MRI (brain, head, background). I chose the values so that 50% of the samples come from the tumour, and 50% from everywhere else, since 1 * 4 = 2 * 2.

The sorted values are [1, 1, 1, 1, 2, 2] and the cumulative histogram is [1, 2, 3, 4, 6, 8].

However, in line 76 the minimum of the sampling map is subtracted, so the values become [0, 0, 0, 0, 1, 1] and the cumulative histogram is [0, 0, 0, 0, 1, 2]. This means that the voxels labelled with 1 in my sampling map will never be sampled, which is not what I expect from the sampler.

Sampling 100000 times using this sampling map, the sampled indices are {2: 45055, 3: 54945}. We can see that only voxels labelled as 2 have been sampled, i.e. sampling_map[2] and sampling_map[3].

If I remove line 76, the result is {0: 3703, 1: 13749, 2: 27542, 3: 27459, 4: 13728, 5: 13819}. All the voxels have been sampled, and indices 2 and 3 have double probability than (most of) the rest: the first index is sampled only 3700 times, whereas the others 1s are sampled ~13700 times.

I think this last issue is due to line 87, where the denominator should be n_samples instead of n_samples + 1. After applying that change, the results are what I expect: {0: 12525, 1: 12482, 2: 24945, 3: 25055, 4: 12423, 5: 12570}.

In summary, there are two issues:

Voxels with lowest value in sampling map are never sampled, even if larger than 0
The inverse transform sampling from the cumulative distribution is biased towards a less frequent sampling of the first voxels of the sampling map

I can create a PR if you want, but I wouldn't know how to add proper unit tests.

fepegar commented 5 years ago

@tomvars

wyli commented 5 years ago

thanks @fepegar a PR would be great, you can append a test here: https://github.com/NifTK/NiftyNet/blob/dev/tests/sampler_weighted_v2_test.py#L224

with assertAllClose you set the tolerance, so you can check something like 24945/12482 is close to 2

NifTK / NiftyNet

Unexpected behaviour of weighted sampler #432