New Toolset for TechEmpower Framework Benchmarks

msmith-techempower commented 3 years ago

Note: October 14, 2020 - testing on Citrine has concluded and performance seems on par (expected) with the legacy toolset.

I am very excited to be writing this issue (finally)!

Over the past several months, I have been rewriting the toolset (the software which actually orchestrates the entire verification/benchmarking/docker-communication/etc process) from scratch in Rust. At present, I would say that the new toolset is nearing MVP status, and as such I need to take down Citrine so that I can test actual results of running benchmarks with the new toolset. I hope that Citrine will not be down long, but I cannot promise a timeline today.

What's wrong with the current toolset?

The current toolset is the result of organic iteration of software that has taken many twists and turns throughout the years. It originally had no verification, no way to tell whether a test had been successfully benchmarked, no containers of any sort, no way to to for certain that a process that was started (which can, and often does, spawn its own child processes) has been stopped, and many more shortcomings. Slowly over time, many of these niceties were added onto the toolset, but in a learning fashion - I was not sure at the time I started working on it how verification would work; I was not sure that Docker would be suitable and work; etc. With the privilege of hindsight, I can see what did and did not work so well. The new toolset aims to be built from the ground up having the luxury of only making the right decisions from these lessons.

The current toolset is a Python application, though many contributors may not know it since it also packaged as a Docker image and there is a handy script (tfb) in the root of the repository. The idea here was simple - we will not force contributors to have Python installed to run the toolset since Docker is a prerequisite to running the toolset. This was nice at the time and served its purpose, but it has some very simple drawbacks: namely that DockerInDocker is required for anything to work, and that causes very complicated Docker configuration issues everywhere and not-yet-solved issues on Windows hosts. Another reason we opted for the Docker image of the toolset is because, at the time, the Docker libraries used widely would not run on Windows natively in Python.

Another issue with the current toolset is adding new test types is extremely cumbersome, confusing, and very technical. In fact, to date we have never had a pull request submitted attempting to add a new test type. On the one hand, it would require much discussion and usually we open an RFC for that, but it is also extremely difficult to wrap one's head around.

What's great about the new toolset?

First and foremost, as I teased on Twitter, the new toolset will build and run as a native executable on both Windows and Linux. At this time, Docker4Windows still uses Hyper-V under the hood to support Docker, I believe, so having a native Windows toolset is great, but will still not be suitable for apples-to-apples benchmarking versus Linux, but this positions us well should that ever come to pass.

Next, the toolset will live in isolation from the framework implementations, which has long been a goal on the TechEmpower side. Having its own repository will allow issues to be more directed instead of muddled with issues about the framework implementation issues. This isolation also means that the toolset can be agnostic of framework implementations and, in fact, agnostic about verification and benchmarking - it can focus on just being in charge of orchestrating the moving pieces.

Additionally, the new toolset will require new configuration files for all framework implementations. Do not worry about this, when we are closer to it going live, we will write a script to do this. The new configuration files are toml instead of json because we have long wanted comments available in these config files and some other niceties which we get in toml.

What other moving pieces?

The new toolset is actually two distinct pieces of software that operate on framework implementation inputs. For example, let's say that I start a new verification tfb_toolset -m verify --test gemini. This will cause the toolset to do many things:

Read gemini's config
If it needs a database, start it via its dockerfile
Start gemini via the appropriate dockerfile
Wait until gemini is accepting requests
Make a request to /json and verify it against the json test type
Output results

The current toolset does all this in Python code and is extremely complicated. In fact, only the benchmarking phase actually uses an additional Docker container to run wrk. The new toolset does all these but through a second piece of software in a published Docker container which is the source of truth for test verifications, databases, and benchmarking.

What will I have to do?

As of right now, nothing. The coming days will be business as usual except our continuous benchmark runs on Citrine will be stopped, but you can still submit open issues, submit PRs, and we will merge them in when ready.

When the new toolset is ready for wider testing, I will rename this pinned issue and add documentation here for how people can test for themselves. This opt-in period will exist for a time while we squash bugs and ensure that the new toolset is at least as user-friendly as the current one. Citrine will be brought back online and run continuous benchmarks again while this test phase is ongoing.

The last step is when we are ready to decommission the current (legacy) toolset and use the new one. We will announce a date when we have a better idea of when that may be, and on that date a pull request will be merged which will remove the toolset directory from this repository entirely. It will remain in the history for anyone interested in taking a look back. The deployment directory will likely also be removed leaving only the frameworks directory. Every framework implementation will have benchmark_config.json replaced with config.toml. Lastly, the README will be updated to include the new requirement of having (either by downloading or building yourself) the new tfb_toolset binary.

Once this last step is merged, Citrine will be brought down for maintenance and then brought back up using the new toolset.

Conclusion

Feel free to use this issue as an informal RFC, and thanks for your understanding about the downtime!

billywhizz commented 3 years ago

hi mike, this sounds great. looking forward to seeing the new toolset. one thing i was going to suggest (apologies if it has been suggested previously) was to have the frameworks in external git repos which can be maintained and changed as desired by the owners. registration of a new framework could be a simple PR with a config change to add the url for the repo where the tests live. this would also mean you guys don't have to spend so much time reviewing and merging PRs for the frameworks.

msmith-techempower commented 3 years ago

We have discussed that possibility before, but I think we landed on thinking that it would be too hard to maintain. Additionally, we would lose a lot of git history surrounding this repository (which is actually something I am interested in) for each implementation. Lastly, it would also mean more work for us maintainers for things we take for granted like name collisions, etc.

msmith-techempower commented 3 years ago

Testing on Citrine has concluded and a new continuous benchmark run has been started.

msmith-techempower commented 3 years ago

For the new toolset, we have just made public the following repositories:

TFBVerifier and the images under TFBDatabases are published to Dockerhub, and in practice a contributor will not need to concern themselves with these projects as TFBToolset will pull the images as needed. Similarly, dockurl is the Docker library used by TFBToolset.

A contributor looking to opt into test the new toolset will have to do the following:

Download the binary
Move it to your local FrameworkBenchmarks directory (or put it on the PATH)
Update your framework's Dockerfile to expose their application server's port (e.g. EXPOSE 8080)
Add a config.toml file to your framework's directory
Run your test - the toolset args are largely unchanged, but it has a -h flag to help

Note: There is an extra step for Windows users (which may go away eventually) to open Docker4Windows settings and enable "Expose daemon on tcp://localhost:2375 without TLS".

Example `config.toml`

[framework]
name = "Gemini"
authors = ["TechEmpower <dev@techempower.com>"]
github = "https://github.com/TechEmpower/gemini"

[main]
urls.json = "/json"
urls.plaintext = "/plaintext"
approach = "Realistic"
classification = "Fullstack"
platform = "Servlet"
webserver = "Resin"
os = "Linux"
versus = "servlet"

[mysql]
urls.db = "/db"
urls.query = "/query?queries="
urls.cached_query = "/cached_query?queries="
urls.fortune = "/fortunes"
urls.update = "/update?queries="
approach = "Realistic"
classification = "Fullstack"
database = "MySQL"
orm = "Micro"
platform = "Servlet"
webserver = "Resin"
os = "Linux"
database_os = "Linux"
versus = "servlet"
tags = ["broken"]

[postgres]
urls.db = "/db"
urls.query = "/query?queries="
urls.cached_query = "/cached_query?queries="
urls.fortune = "/fortunes"
urls.update = "/update?queries="
approach = "Realistic"
classification = "Fullstack"
database = "Postgres"
orm = "Micro"
platform = "Servlet"
webserver = "Resin"
os = "Linux"
database_os = "Linux"
versus = "servlet"

Presently, since this is basically a beta, the config.toml requirements may be subject to change as we move forward, but this should serve as a general approach to converting from an existing benchmark_config.json.

For the sake of keeping things clean - please report any issues/errors/weirdness regarding running the new toolset to that repo's issue tracker.

sagenschneider commented 3 years ago

Will the website to publish the results also be open sourced?

I would really like #4319 to be done to have links from website to particular version of code. This I'm hoping would make it a lot easier to see what each entry is doing. While can do hunt, it involves knowing TFB more intimately, which most pass by users will not. Hence, I tend to find when showing others, they just judge on name and numbers.

msmith-techempower commented 3 years ago

Will the website to publish the results also be open sourced?

This is on our longer-term roadmap.

I would really like #4319 to be done to have links from website to particular version of code. This I'm hoping would make it a lot easier to see what each entry is doing. While can do hunt, it involves knowing TFB more intimately, which most pass by users will not. Hence, I tend to find when showing others, they just judge on name and numbers.

One thing the new toolset supports is the new configuration file format (toml) as opposed to benchmark_config.json. This will support the addition of new keys more flexibly and support the ability to better organize metadata about test implementations. Once this is in place, it will be much simpler to add support on the results website for things like linking to the test implementation, providing some useful human-consumable data about the implementation, etc.

TechEmpower / FrameworkBenchmarks