Closed itisacloud closed 4 months ago
@ElJocho @mmerdes what should we do with this issue? there is a script, but probably it's outdated. I think it might be best to remove it again and then close this issue?
Hey @Hagellach37 , I update the Gatling scripts recently to also include queries to the topics endpoints, so we can close it, because it is done :)
The small scripts should probably be deleted, because they are not used
Description
We need a Python script to automate benchmarking of ClickHouse DB performance by running a predefined list of queries against it. Currently, we perform these tests manually with Jochen Stier and Johannes Visintini. Jochen has a method to run queries without using cached files, and we anticipate these queries to be slow.
Once the files are loaded into RAM, we expect the queries against ClickHouse to be much faster. Therefore, the script should provide timing information for these queries, considering different conditions such as whether the files are in RAM or not. For example, we have observed that the number of CPUs impacts the aggregation speed, so the script should capture this information as well.
It would be ideal if the script could output the results in a table or HTML page format, displaying metrics such as minimum, maximum, mean, and median for each query/endpoint under different conditions (files in RAM vs. files not in RAM).
Additionally, this benchmarking tool will be valuable in the future to assess the impact of constantly inserting new data on query performance. Although we don't anticipate any issues, it would be beneficial to have an easy way to test this scenario.
Please investigate if there are any existing frameworks that already support this type of analysis.
Desired Features:
Additional Information
Manual benchmarking has been performed with Jochen Stier and Johannes Visintini. Jochen Stier has a method to run queries without cached files, which are expected to be slow. Queries against ClickHouse are expected to be faster when files are loaded in RAM. Aggregation speed has shown to be influenced by the number of CPUs. The script's output should facilitate easy testing of new data insertion impact on query performance. Research existing frameworks that support comparable analysis