Missing dimensions of this benchmark suite

It is nice to have these AI benchmarks. From an academic perspective, this benchmark can be improved as follows:

Input datasets. For input-sensitivity study, a lot of datasets are needed. Since this benchmark originates from industry, collecting datasets should be relatively easy to be addressed by the Alibaba than anyone else.
Correctness/Accuracy criteria. With compiler involved in the optimization process, it is easy to have an incorrect compiled binary. Therefore, it is extremely important to have a correctness checking feature for a successful benchmark suite. For example, SPEC CPU 2006/2017 have built-in correctness checking feature as part of its scripted tool chain; many HPC benchmarks, such as Cloverleaf/Cleverleaf also have these kind of features. For approximated computation, especially on machine-learning, numerical correctness may not be applicable. Instead, accuracy may be a better criterion. Again, this domain-specific criterion is easy for Alibaba to provide and critical for researchers in other domains.
Automated installation and report. Installation of big programs on main-stream Linux distribution, especially without root privilege, can be very challenging. Reporting the benchmark results could also be an interesting feature to include. So far, SPEC seems to be most successful in this aspect than any other benchmark suites I have tried. User-space software package management tools such as linuxBrew, spack (LLNL), are very useful to automate installation. As another example, this on-going exascale computing benchmark suite (https://proxyapps.exascaleproject.org/ecp-proxy-apps-suite/) is supported by spack (https://spack.readthedocs.io/en/latest/package_list.html) for automatic installation, not only the package itself but also its dependency, all in user space.

@twang15 Thanks for your comments. This benchmark is still at the early stage and it is great if users could provide us with any suggestions.

It is nice to have these AI benchmarks. From an academic perspective, this benchmark can be improved as follows:

Input datasets. For input-sensitivity study, a lot of datasets are needed. Since this benchmark originates from industry, collecting datasets should be relatively easy to be addressed by the Alibaba than anyone else.

We basically thought about this issue and have some internal discussion. One of our tasks in the future release is to provide some datasets for available benchmarks, especially some Alibaba applications. We are working on it.

Correctness/Accuracy criteria. With compiler involved in the optimization process, it is easy to have an incorrect compiled binary. Therefore, it is extremely important to have a correctness checking feature for a successful benchmark suite. For example, SPEC CPU 2006/2017 have built-in correctness checking feature as part of its scripted tool chain; many HPC benchmarks, such as Cloverleaf/Cleverleaf also have these kind of features. For approximated computation, especially on machine-learning, numerical correctness may not be applicable. Instead, accuracy may be a better criterion. Again, this domain-specific criterion is easy for Alibaba to provide and critical for researchers in other domains.

Good suggestion here. We are actually aware of this issue and needs from others. The work is ongoing and tries to make it happen on layer-based benchmark first.

Automated installation and report. Installation of big programs on main-stream Linux distribution, especially without root privilege, can be very challenging. Reporting the benchmark results could also be an interesting feature to include. So far, SPEC seems to be most successful in this aspect than any other benchmark suites I have tried. User-space software package management tools such as linuxBrew, spack (LLNL), are very useful to automate installation. As another example, this on-going exascale computing benchmark suite (https://proxyapps.exascaleproject.org/ecp-proxy-apps-suite/) is supported by spack (https://spack.readthedocs.io/en/latest/package_list.html) for automatic installation, not only the package itself but also its dependency, all in user space.

The applications are collected from other open source software. Different applications will have different dependency and it takes time to set them up in automated installation process. Our solution will leverage the docker image to help users get rid of the annoying installation issues.

alibaba / ai-matrix

Missing dimensions of this benchmark suite #2