Speed up "exhaustive" EMD minimisation

martintoreilly commented 7 years ago

The new exhaustive EMD minimisation method is much slower than both the optimise-based method and the previous fixed step method. Identify bottlenecks and any potential speed ups.

Likely potential candidates are:

Reducing the number of computations done when identifying the next smallest shift
Removing the need to recreate a dhist_ecmf(approxfun) object after each shift

martintoreilly commented 7 years ago

Initial profiler results

Total time is time spent in min_emd_exhaustive

Run	Total	Create ECMF	ECMF area	All dist.	Positive dist.
1	19,220ms	6,930ms (36.1%)	3,020ms (20.3%)	4,560ms (23.7%)	4,110ms (21.4%)
2	12,280ms	4,480ms (36.5%)	1,970ms (16.0%)	2,920ms (23.7%)	2,420ms (19.7%)
3	15,740ms	5,710ms (26.3%)	2,610ms (16.6%)	3,670ms (23.3%)	3,210ms (20.4%)
4	18,450ms	6,690ms (36.2%)	3,170ms (17.1%)	4,230ms (22.9%)	3,690ms (20.0%)
5	10,580ms	3,640ms (34.4%)	1,890ms (17.9%)	2,290ms (21.6%)	2,400ms (22.7%)

martintoreilly commented 7 years ago

Calculation of shift to next alignment

Amended code to keep a copy of the matrix of distances between all pairs of locations and the last shift. This allows us to update the distance matrix to find the next shift simply by subtracting the previous shift to all elements [commit 0d6c008].

Profiler results

Total time is time spent in min_emd_exhaustive

Run	Total	Create ECMF	ECMF area	All dist.	Positive dist.
1	1,310ms	670ms (51.1%)	300ms (22.9%)	160ms (12.2%)	130ms (9.9%)

The above timings look suspicious. The code changes were restricted to potentially speeding up the "All dist." computation. It is conceivable that there may have been some effect on the "Positive dist." computation, but the apparent speedup in "Create ECMF" and "ECMF area" computations is basically impossible. The code to calculate these has not changes, not has the number of times these calculations need to be performed.

martintoreilly commented 7 years ago

Odd effect of calling NetEMD vs MinEMD on timing

Reverted to the slow code [commit de1100d] to validate my initial pre-speedup timings and found an odd impact on execution speed depending on whether I call net_emd (around 15s) vs calling the underlying min_emd function directly (around 0.6s). Something funny is going on here.

Capturing net_emd function here for reference.

net_emd <- function(dhists1, dhists2, method = "optimise", 
                    return_details = FALSE, smoothing_window_width = 0) {
  # Require either a pair of "dhist" discrete histograms or two lists of "dhist"
  # discrete histograms
  pair_of_dhist_lists <- all(purrr::map_lgl(dhists1, is_dhist)) && all(purrr::map_lgl(dhists2, is_dhist))

  # If input is two lists of "dhist" discrete histograms, determine the minimum
  # EMD and associated offset for pairs of histograms taken from the same 
  # position in each list
  if(pair_of_dhist_lists) {
    details <- purrr::map2(dhists1, dhists2, function(dhist1, dhist2) {
      net_emd_single_pair(dhist1, dhist2, method = method,
                          smoothing_window_width = smoothing_window_width)
      })
    # Collect the minimum EMDs and associated offsets for all histogram pairs
    min_emds <- purrr::simplify(purrr::transpose(details)$min_emd)
    min_offsets <- purrr::simplify(purrr::transpose(details)$min_offset)
    # The NetEMD is the arithmetic mean of the minimum EMDs for each pair of 
    # histograms
    arithmetic_mean <- sum(min_emds) / length(min_emds)
    net_emd <- arithmetic_mean
    # Return just the NetEMD or a list including the NetEMD plus the details of
    # the minumum EMD and associated offsets for the individual histograms
    # Note that the offsets represent shifts after the histograms have been
    # scaled to unit variance
    if(return_details) {
      return(list(net_emd = net_emd, min_emds = min_emds, min_offsets = min_offsets))
    } else {
      return(arithmetic_mean)
    }
  }
  else {
    # Wrap each member of a single pair of histograms is a list and recursively
    # call this net_emd function. This ensures they are treated the same.
    return(net_emd(list(dhists1), list(dhists2), method = method, 
                   return_details = return_details,
                   smoothing_window_width = smoothing_window_width))
  }
}

All time spent in net_emd appears to be fully accounted for by the time spent calling min_emd_exhaustive in the underlying call to min_emd. See profiler screenshots below.

Calling NetEMD

net_emd spends 17,520ms calling net_emd_single_pair ...

... which spends 17,520ms calling min_emd ...

... which spends 17,520ms calling `min_emd_exhaustive ...

martintoreilly commented 7 years ago

Hmmm. The difference between calling net_emd and min_emd has now basically disappeared, without me doing anything explicit. This coincides with the speed up code changes showing up in git as unstaged changes. I suspect something funny is going on when changing files that are open in RStudio when switching branches.

Edit: That was not the issue. The odd speed difference between calling net_emd and calling min_emd persists even after confirming that the code executed is the old "slow" code.

martintoreilly commented 7 years ago

Same speed difference persists between calling net_emd_single_pair and min_emd, which completely rules out anything odd occurring with wrapping a single pair of histograms into two single-member lists of histograms in net_emd.

martintoreilly commented 7 years ago

Isolated the speed difference to the fact that, when called through net_emd_single_pair, the discrete histograms are normalised to unit mass and unit variance via normalise_dhist_variance(normalise_dhist_mass(dhist)). This operation itself does not take any appreciable time. However, exhaustively computing the minimal EMD appears to be an order of magnitude slower for histograms that have been normalised.

This can't be an integer vs floating point related speed difference, as the histograms are created with random floating point locations and masses.

martintoreilly commented 7 years ago

Note on speed

Even with the speed up, the exhaustive calculation is prohibitively slow. Running the 276 NetEMD comparisons for our random graph test subset did not complete even when running for at least 4 solid days on a late 2015 retina iMac with 4-core 4GHz Intel Core i7 and 32GB RAM).

Test data

All pairings of 5,000 node / 20 density graphs in the R-package inst/extdata/random folder at this R-code commit (i.e. matching filename pattern .5000.). These are the graphs from the RG1-3 dataset with filename pattern Type_5000_20_n (with n in range 1-3), and these have approximately 5,000 nodes and 50,000 edges.

alan-turing-institute / network-comparison