DorisAmoakohene / Codes-and-Test-for-my-paper

0 stars 0 forks source link

Comparative Benchmarking for R packages performing similar tasks #2

Open DorisAmoakohene opened 1 month ago

DorisAmoakohene commented 1 month ago

@tdhock I have decided to show the below for graphs for comparative benchmarking for R functions and packages performing similar task

gg read

gg write

ml gg

tdhock commented 1 month ago

please use only one of these.

the first one (fread) is the best, because there is one method (utils::read.csv) which has a larger slope than others.

please remove all but 3 algos

tdhock commented 1 month ago

please increase limit line text size, and direct label text size, so that both are about the same size as axes/tick size. For direct labels use method=list(cex=1.2, "top.polygons") where the cex number controls text size

DorisAmoakohene commented 1 month ago

gg read 3

tdhock commented 1 month ago

great improvement but please make sure text size is similar for all text on figure (seconds=1 is too small)

DorisAmoakohene commented 1 month ago

gg read 3

tdhock commented 1 month ago

you increased the time limit to 1.5 seconds but the size of the text is still the same. To increase text size, use geom_text(size=5) etc

DorisAmoakohene commented 1 month ago

I have added geom_text(size=5) gg read 3

tdhock commented 1 month ago

seconds=1.5 is still too small please revise and/or share code so I can see what is wrong

DorisAmoakohene commented 1 month ago
read.colors <- c(
  "readr::read_csv\n(lazy=TRUE)"="#9970AB",
  "data.table::fread"="#D6604D",
  "utils::read.csv" = "deepskyblue")

n.rows <- 100
seconds.limit <- 5

atime.read.vary.cols <- atime::atime(
  N=as.integer(10^seq(2, 6, by=0.5)),
  setup={
    set.seed(1)
    input.vec <- rnorm(n.rows*N)
    input.mat <- matrix(input.vec, n.rows, N)
    input.df <- data.frame(input.mat)
    input.csv <- tempfile()
    fwrite(input.df, input.csv)
  },
  seconds.limit = seconds.limit,
  "data.table::fread"={
    data.table::fread(input.csv, showProgress = FALSE)
  },
  "readr::read_csv\n(lazy=TRUE)"={
    readr::read_csv(input.csv, progress = FALSE, show_col_types = FALSE, lazy=TRUE)
  },
  "utils::read.csv"=utils::read.csv(input.csv))
refs.read.vary.cols <- atime::references_best(atime.read.vary.cols)
pred.read.vary.cols <- predict(refs.read.vary.cols)

png("gg.read.3.png", res = 600, width = 18, height = 12, unit = "in")
gg.read.3 <- plot(pred.read.vary.cols)+
  geom_text(text = 5)+
  theme(
    text=element_text(size=35),
    axis.text = element_text(size = 20),
    axis.title = element_text(size = 20)
    )+
  scale_x_log10("N = number of columns to read")+
  scale_y_log10("Computation time (seconds)
median line, min/max band
over 10 timings")+
  facet_null()+
  scale_fill_manual(values=read.colors)+
  scale_color_manual(values=read.colors)
directlabels::direct.label(gg.read.3, list(cex = 1.2, "top.polygons"))
dev.off()

gg read 3

tdhock commented 1 month ago

your problem is here

gg.read.3 <- plot(pred.read.vary.cols)+
  geom_text(text = 5)+

geom_text(text=5) does not draw anything if you do not give it any data set to draw, so the size=5 argument does nothing here. you need to write your own ggplot code, instead of using plot(pred.read.vary.cols)

DorisAmoakohene commented 1 month ago

gg read 3 does this give the plot you want

This is the ggplot code i am using


png("gg.read.3.png", res = 600, width = 15, height = 10, unit = "in")
gg.read.3 <- ggplot()+
   geom_line(data = atime.read.vary.cols$measurements, aes(x = N, y = median, color = expr.name, group = expr.name, fill = expr.name)) +
  geom_ribbon(aes(x=N, ymin = min, ymax = max, fill = expr.name), data = atime.read.vary.cols$measurements, alpha = 0.5)+
  theme(
    text = element_text(size = 35),
    axis.text = element_text(size = 20),
    axis.title = element_text(size = 20)
  ) +
   scale_x_log10("N = number of columns to read")+
  scale_y_log10("Computation time (seconds)") +
  scale_fill_manual(values = read.colors) +
  scale_color_manual(values = read.colors)

directlabels::direct.label(gg.read.3, list(cex = 2, "top.polygons"))
dev.off()
tdhock commented 1 month ago

the original issue was that some text on the figure (seconds=5) was smaller than other text, making it difficult to read. that issue persists. for example the readr direct label is much smaller than others, please fix by either removing the (lazy=TRUE) or giving more vertical space so that label is not reduced in size, relative to the others.

also currently width=15 and height=10 which will probably result in text that is too small to read in the context of a paper. please fix by reducing the overall figure size, which results in text which looks bigger when the figure is scaled to page width.

also in the new figure seconds=5 is gone, and so are the N= numbers in the direct labels, so I wonder if you need those to prove the point you are trying to make with this figure?