alan-turing-institute / network-comparison

An R package implementing the NetEMD and NetDis network comparison measures
MIT License
14 stars 3 forks source link

Counting with floating point numbers #111

Open andeElliott opened 6 years ago

andeElliott commented 6 years ago

In a test graph, I am getting the following locations vector in a dhist:

[-0.707107, -0.000000, -0.000000, 0.000000, 0.500000, 0.500000]

Notice the two final points are the same (in fact they are 10^(-16) different but the same to machine precision) and the middle points are likely the same (but I didn't check). There seems to be cases where it is close enough and where it isn't (i.e I can see places with 2 0.5 are placed in the same bin and some where they are not.

Note, this is very unlikely to make a large difference to the actual answer as we would be adding one additional segment of width 10^(-16) and of height around than 1/n, even if this happens (n/2) times this is still small. But it would affect the speed of the algorithm as a smaller number of points will solve everything.

I think we just need to add a binning step to the end of counts to dhist

andeElliott commented 6 years ago

So this is more of an issue than I realised, when dealing with a single bin with noise the variance scaling can make this problem quite large, (NetEMDs of 0.3 between nominally constant sequences), we should add a round step for this issue.