Closed Anirban166 closed 4 months ago
looks good can you please add dot(s) for the max speedup / min time? (is that the same?)
also please use facet_grid isntead of grid.arrange (which repeats axes, pontentially confusing)
also please commit this code instead of just pasting it in the issue
looks good can you please add dot(s) for the max speedup / min time? (is that the same?)
also please use facet_grid isntead of grid.arrange (which repeats axes, pontentially confusing)
Done! (please check and yup, the points representing maximum speedup and minimum runtime would be the same here)
library(ggplot2)
library(data.table)
library(microbenchmark)
run_benchmarks <- function(rowCount, colCount, threadCount) {
setDTthreads(threadCount)
dt <- data.table(matrix(runif(rowCount * colCount), nrow = rowCount, ncol = colCount))
threadLabel <- ifelse(threadCount == 1, "thread", "threads")
cat(sprintf("Running benchmarks with %d %s, %d rows, and %d columns.\n", getDTthreads(), threadLabel, rowCount, colCount))
benchmarks <- microbenchmark(
forder = setorder(dt, V1),
GForce_sum = dt[, .(sum(V1))],
subsetting = dt[dt[[1]] > 0.5, ],
frollmean = frollmean(dt[[1]], 10),
fcoalesce = fcoalesce(dt[[1]], dt[[2]]),
between = dt[dt[[1]] %between% c(0.4, 0.6)],
fifelse = fifelse(dt[[1]] > 0.5, dt[[1]], 0),
nafill = nafill(dt[[1]], type = "const", fill = 0),
CJ = CJ(sample(rowCount, size = min(rowCount, 5)), sample(colCount, size = min(colCount, 5))),
times = 100
)
benchmark_summary <- summary(benchmarks)
meanTime <- benchmark_summary$mean
names(meanTime) <- benchmark_summary$expr
return(data.frame(threadCount = threadCount, expr = names(meanTime), meanTime = meanTime))
}
find_optimal_threads <- function(rowCount, colCount) {
setDTthreads(0)
maxThreads <- getDTthreads()
results <- list()
for (threadCount in 1:maxThreads) {
results[[threadCount]] <- run_benchmarks(rowCount, colCount, threadCount)
}
return(do.call(rbind, results))
}
benchmarkData <- find_optimal_threads(10000000, 10)
rownames(benchmarkData) <- NULL
benchmarkData$speedup <- benchmarkData$meanTime[benchmarkData$threadCount == 1] / benchmarkData$meanTime
idealSpeedup <- seq(1, getDTthreads())
setDT(benchmarkData)
maxSpeedup <- benchmarkData[, .(threadCount = threadCount[which.max(speedup)], speedup = max(speedup)), by = expr]
# Alternatively, I could also calculate the minimum runtime (mean value across the hundred runs) for each routine:
# minMeanTime <- benchmarkData[, .(threadCount = threadCount[which.min(meanTime)], speedup = min(meanTime)), by = expr]
# (Using this instead of maxSpeedup below gives me the same points)
ggplot(benchmarkData, aes(x = threadCount, y = speedup, color = expr)) +
geom_line() +
geom_line(data = data.frame(threadCount = 1:getDTthreads(), speedup = idealSpeedup), aes(x = threadCount, y = speedup), linetype = "dashed", color = "red") +
geom_point(data = maxSpeedup, aes(x = threadCount, y = speedup), color = "black", size = 2) +
facet_grid(. ~ expr, scales = "free_y") +
labs(x = "Threads", y = "Speedup", title = "data.table functions") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(breaks = 1:getDTthreads(), labels = 1:getDTthreads())
also please commit this code instead of just pasting it in the issue
Done as well but please note that it's not ready yet as a package - I was holding off committing till I fix the issues that are popping up when using devtools::load_all()
, but for now I've updated the code (which works fine standalone) and uploaded some R package files/essentials.
great improvement. color legend is redudunant with panel, right? please remove because redundantly encoded information (data variable with more than one different visual property) is potentially confusing. please add linetype legend so that there are two values, ideal and measured. also please try coord_equal maybe try facet_wrap
looks like speedups are far from ideal so picking the max speedup (black dot) is not very efficient. in addition to black dot, can you please plot a geom_text next to it which tells us how many threads achieved the max?
maybe take a line of slope 0.5 (or user-defined), add that to the plot as a third linetype, and then add another point/text which is the max speedup that is above that line, which we could recommend as a number of threads to use for efficient speedups
great improvement. color legend is redudunant with panel, right? please remove because redundantly encoded information (data variable with more than one different visual property) is potentially confusing. please add linetype legend so that there are two values, ideal and measured. also please try coord_equal maybe try facet_wrap
looks like speedups are far from ideal so picking the max speedup (black dot) is not very efficient. in addition to black dot, can you please plot a geom_text next to it which tells us how many threads achieved the max?
You're right, and done!
maybe take a line of slope 0.5 (or user-defined), add that to the plot as a third linetype, and then add another point/text which is the max speedup that is above that line, which we could recommend as a number of threads to use for efficient speedups
I'm a bit confused here so let's follow up on this tomorrow morning
can you please have the dot be the max subject to the constraint that it is above the 0.5 slope line?
can you please have the dot be the max subject to the constraint that it is above the 0.5 slope line?
Done! (please check)
It took some time to accurately code the logic for that but I went for extracting the points where 'Measured Speedup' and 'Sub-optimal speedup' lines are visually closest (least deviation on the y-axis/speedup) or intersect (previously used their absolute values since it's not a perfect number match), and then I got the ones where speedup is maximized among them.
Note that the 'Sub-optimal Speedup' line that I'm using is indeed a 0.5 slope line, but it stretches from (1, 1) to (threadCount
, threadCount
/2) times to match with the plot (just like how thread values always start from 1 and not 0).
It is still 0.5 in the sense that for every 1 unit increase in the x-axis the y-axis increases by 0.5 units (got the values through interpolation after extending the geometry to have threadCount
number of points instead of just drawing a line between two points).
Here's how the plot looks like from an example run:
benchmarkData <- findOptimalThreadCount(1e7, 10)
plot(benchmarkData)
and then add another point/text which is the max speedup that is above that line, which we could recommend as a number of threads to use for efficient speedups
And I'm keeping two dots for what you said above (added a legend to distinguish between them)
One thing I've been pondering since yesterday is if it would be practical to have the user enter the total size of their data or the product of the number of rows and columns instead of separate arguments for both - It would be less ideal in terms of input flexibility, but the benefit in doing so could be that I can then allocate more number of columns for the functions that perform better in terms of the parallel scaling across more columns, and likewise do the same for ones that benefit with more rows in the data.
For instance, functions such as forder
do better with increased thread count when the data has more rows (as we are observing), but functions such as frollmean
would do better with higher thread count when there are more number of columns in the data (an observation stemming from my past benchmarks).
I'm thinking of a simple manual allocation like I'm currently testing with - like 10 of one parameter, and the other one being the total data size divided by 10 (e.g.: For a data size of 1e7, we can perform benchmarks on 10 rows with 1e6 columns and 1e6 rows with 10 columns for corresponding functions that benefit accordingly from this distribution).
@tdhock what do you think? (also, any thoughts on the allocation of rows and columns if this sounds good? I'm wondering if it would make sense for the user to enter a set of row and column values too)
does ribbon show mean+/-SD? need a better name for "sub-optimal speedup" maybe "recommended speedup"
My code so far:
Benchmarked data used to generate the above
```r > benchmarkData threadCount expr meanTime speedup 1 1 forder 240.357339 1.0000000 2 1 GForce_sum 15.796816 1.0000000 3 1 subsetting 79.248770 1.0000000 4 1 frollmean 24.982181 1.0000000 5 1 fcoalesce 14.946403 1.0000000 6 1 between 47.963492 1.0000000 7 1 fifelse 33.290711 1.0000000 8 1 nafill 8.625862 1.0000000 9 1 CJ 4.210141 1.0000000 10 2 forder 149.965999 1.6027456 11 2 GForce_sum 15.837577 0.9974263 12 2 subsetting 72.494060 1.0931760 13 2 frollmean 25.058196 0.9969665 14 2 fcoalesce 9.449926 1.5816424 15 2 between 37.051850 1.2944966 16 2 fifelse 26.154752 1.2728361 17 2 nafill 9.650479 0.8938274 18 2 CJ 5.201596 0.8093941 19 3 forder 123.404270 1.9477230 20 3 GForce_sum 15.772685 1.0015299 21 3 subsetting 63.440701 1.2491787 22 3 frollmean 25.129612 0.9941332 23 3 fcoalesce 10.366038 1.4418626 24 3 between 31.575813 1.5189947 25 3 fifelse 23.750762 1.4016691 26 3 nafill 8.520967 1.0123103 27 3 CJ 4.402384 0.9563319 28 4 forder 106.497652 2.2569262 29 4 GForce_sum 15.738023 1.0037357 30 4 subsetting 60.926726 1.3007226 31 4 frollmean 24.438927 1.0222290 32 4 fcoalesce 9.059327 1.6498359 33 4 between 30.030850 1.5971407 34 4 fifelse 23.168890 1.4368712 35 4 nafill 8.271142 1.0428866 36 4 CJ 3.902546 1.0788191 37 5 forder 97.492574 2.4653912 38 5 GForce_sum 15.749254 1.0030200 39 5 subsetting 59.008015 1.3430171 40 5 frollmean 24.428953 1.0226464 41 5 fcoalesce 9.449219 1.5817606 42 5 between 26.733613 1.7941268 43 5 fifelse 22.250689 1.4961654 44 5 nafill 8.682627 0.9934623 45 5 CJ 4.796245 0.8777994 46 6 forder 94.746236 2.5368537 47 6 GForce_sum 15.791068 1.0003640 48 6 subsetting 56.711824 1.3973941 49 6 frollmean 24.748496 1.0094424 50 6 fcoalesce 9.625439 1.5528022 51 6 between 27.277986 1.7583224 52 6 fifelse 22.193039 1.5000519 53 6 nafill 9.374746 0.9201169 54 6 CJ 4.361114 0.9653819 55 7 forder 91.377914 2.6303658 56 7 GForce_sum 15.774722 1.0014006 57 7 subsetting 58.802314 1.3477152 58 7 frollmean 24.836142 1.0058801 59 7 fcoalesce 9.519909 1.5700153 60 7 between 25.251394 1.8994394 61 7 fifelse 21.636933 1.5386058 62 7 nafill 9.810404 0.8792566 63 7 CJ 4.564388 0.9223889 64 8 forder 91.423351 2.6290585 65 8 GForce_sum 15.797678 0.9999454 66 8 subsetting 57.267878 1.3838258 67 8 frollmean 24.998616 0.9993426 68 8 fcoalesce 9.373948 1.5944619 69 8 between 27.136203 1.7675093 70 8 fifelse 21.363143 1.5583246 71 8 nafill 10.077464 0.8559557 72 8 CJ 5.699883 0.7386363 73 9 forder 87.352598 2.7515763 74 9 GForce_sum 15.815176 0.9988391 75 9 subsetting 62.297943 1.2720929 76 9 frollmean 24.898455 1.0033627 77 9 fcoalesce 9.577203 1.5606230 78 9 between 26.234959 1.8282282 79 9 fifelse 22.128786 1.5044075 80 9 nafill 9.492020 0.9087489 81 9 CJ 4.052693 1.0388501 82 10 forder 91.192193 2.6357228 83 10 GForce_sum 15.878428 0.9948602 84 10 subsetting 64.146062 1.2354425 85 10 frollmean 25.933211 0.9633277 86 10 fcoalesce 9.496062 1.5739580 87 10 between 27.534979 1.7419113 88 10 fifelse 22.214027 1.4986347 89 10 nafill 9.303581 0.9271551 90 10 CJ 4.239453 0.9930859 ```