fstpackage / synthetic

R package for dataset generation and benchmarking
GNU Affero General Public License v3.0
20 stars 1 forks source link

Use dplyr syntax to separate synthetic_bench() arguments #23

Closed MarcusKlik closed 4 years ago

MarcusKlik commented 4 years ago

For example:

# define generators and streamers here

synthetic_bench(nr_of_runs = 10, progress = TRUE, "results") %>%
  bench_tables(single_column_table, random_table, text_only_table) %>%
  bench_streamers(fst, arrow, feather, datatable) %>%
  bench_rows(c(1e6, 5e6, 1e7)) %>%
  bench_threads(1:8) %>%
  bench_compression(c(0, 50, 80)) %>%
  compute()

the result should be identical to:

synthetic_bench(list(
  single_column_table, random_table, text_only_table
  ), list(
  fst, arrow, feather, datatable),
  c(1e6, 5e6, 1e7),
  c(0, 50, 80),
  1:8,  # not implemented yet
  result_folder = "results") {

but more readable

MarcusKlik commented 4 years ago

Executing the expression before compute() is used should return a summary of the defined benchmark, so:

generators %>%
  bench_streamers(fst, arrow, feather, datatable) %>%
  bench_rows(c(1e6, 5e6, 1e7))

might return a description about the benchmark having 4 streamers and 3 sizes, so a total of 12 measurements will be performed. Each measurement is run 10 times for a total of 120 measurements. Then perhaps a description of the generator and it's columns.

Only at compute() time the actual benchmark is done.

MarcusKlik commented 4 years ago

The dplyr syntax is now the only available syntax