duckdblabs / db-benchmark

reproducible benchmark of database-like ops
https://duckdblabs.github.io/db-benchmark/
Mozilla Public License 2.0
143 stars 27 forks source link

CI when no errors prints last line of scripts output #27

Closed jangorecki closed 1 year ago

jangorecki commented 1 year ago

This gives something like

$ tail -n 1 out/*.out
==> out/run_datatable_rollfun_R1_1e6_NA_0_1.out <==
rolling finished, took 1s

==> out/run_datatable_rollfun_R1_1e8_NA_0_1.out <==
loading dataset R1_1e8_NA_0_1

==> out/run_dplyr_rollfun_R1_1e6_NA_0_1.out <==
rolling finished, took 4s

...

which is useful. It can happen that script got stuck/terminated but process did not wrote an "error" word to console, then it would not be detected. This way user can easily see if script finished completely without needing to download and unzip artifacts. I check CI logs from my mobile, and downloading+unziping and looking for that info is not very convenient. After this change I just browse GH checks page and can see it.

jangorecki commented 1 year ago

So we are now seeing the following

==> out/rmarkdown_history.out <==
Execution halted

==> out/rmarkdown_index.out <==
Execution halted

==> out/rmarkdown_tech.out <==
Execution halted

==> out/run_arrow_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 49s

==> out/run_arrow_join_J1_1e7_NA_0_0.out <==
3    11  2062 5483091 id11  id2062 id5483091  10.0     7  2062 id7    58.3

==> out/run_datafusion_groupby_G1_1e7_1e2_0_0.out <==
(10000, 3)

==> out/run_datafusion_join_J1_1e7_NA_0_0.out <==
(10000000, 11)

==> out/run_datatable_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 22s

==> out/run_datatable_join_J1_1e7_NA_0_0.out <==
joining finished, took 136s

==> out/run_dplyr_groupby_G1_1e7_1e2_0_0.out <==
3 id100   100 0.000332    

==> out/run_dplyr_join_J1_1e7_NA_0_0.out <==
joining finished, took 154s

==> out/run_duckdb-latest_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 19s

==> out/run_duckdb-latest_join_J1_1e7_NA_0_0.out <==
joining finished, took 21s

==> out/run_duckdb_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 18s

==> out/run_duckdb_join_J1_1e7_NA_0_0.out <==
joining finished, took 21s

==> out/run_juliadf_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 30s

==> out/run_juliadf_join_J1_1e7_NA_0_0.out <==
joining finished, took 23s

==> out/run_juliads_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 105s

==> out/run_juliads_join_J1_1e7_NA_0_0.out <==
joining finished, took 77s

==> out/run_pandas_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 59s

==> out/run_pandas_join_J1_1e7_NA_0_0.out <==
joining finished, took 52s

==> out/run_polars_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 13.611s

==> out/run_polars_join_J1_1e7_NA_0_0.out <==
joining finished, took 8s

==> out/run_pydatatable_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 71s

==> out/run_pydatatable_join_J1_1e7_NA_0_0.out <==
joining finished, took 53s

==> out/run_spark_groupby_G1_1e7_1e2_0_0.out <==
grouping finished, took 213s

==> out/run_spark_join_J1_1e7_NA_0_0.out <==
joining finished, took 351s
jangorecki commented 1 year ago

The following scripts terminated abnormally:

==> out/run_arrow_join_J1_1e7_NA_0_0.out <==
3    11  2062 5483091 id11  id2062 id5483091  10.0     7  2062 id7    58.3

==> out/run_datafusion_groupby_G1_1e7_1e2_0_0.out <==
(10000, 3)

==> out/run_datafusion_join_J1_1e7_NA_0_0.out <==
(10000000, 11)

==> out/run_dplyr_groupby_G1_1e7_1e2_0_0.out <==
3 id100   100 0.000332    
jangorecki commented 1 year ago

@Dandandan two of those scripts are data fusion. If query exceptions haven't been defined (in _benchplot/benchplot-dict.R) when submitting data fusion, then it will be shown as undefined error on the plot. Maybe it is only related to small CI machines and it complete successfully on the normal machine, then no exception definition is needed.