Open andeElliott opened 6 years ago
So this is more of an issue than I realised, when dealing with a single bin with noise the variance scaling can make this problem quite large, (NetEMDs of 0.3 between nominally constant sequences), we should add a round step for this issue.
In a test graph, I am getting the following locations vector in a dhist:
[-0.707107, -0.000000, -0.000000, 0.000000, 0.500000, 0.500000]
Notice the two final points are the same (in fact they are 10^(-16) different but the same to machine precision) and the middle points are likely the same (but I didn't check). There seems to be cases where it is close enough and where it isn't (i.e I can see places with 2 0.5 are placed in the same bin and some where they are not.
Note, this is very unlikely to make a large difference to the actual answer as we would be adding one additional segment of width 10^(-16) and of height around than 1/n, even if this happens (n/2) times this is still small. But it would affect the speed of the algorithm as a smaller number of points will solve everything.
I think we just need to add a binning step to the end of counts to dhist