adoptium / aqa-test-tools

Home of Test Results Summary Service (TRSS) and PerfNext. These tools are designed to improve our ability to monitor and triage tests at the Adoptium project. The code is generic enough that it is extensible for use by any project that needs to monitor multiple CI servers and aggregate their results.
Apache License 2.0
28 stars 79 forks source link

Performance Analysis Tools (Proposal from Developer JumpStart Tech Challenge) #28

Open piyush286 opened 5 years ago

piyush286 commented 5 years ago

Proposal Name: Performance Analysis Tools Proposal Owner: Piyush Gupta / Shelley Lambert / Lan Xia Technical Mentor: Piyush Gupta Team: @AdamNatale @Variony @acarnegie @armaanfarhadi @kguirguis

Temporary Branch for the Development of these Features:

https://github.com/AdoptOpenJDK/openjdk-test-tools/tree/pat

Blurb (Short overview of the Proposal):

Since performance is crucial for any product that we use in our lives, developers are always striving to evaluate and boost our performance on various workloads by running the latest releases and development builds against different benchmarks and by identifying opportunities for compiler optimizations. As part of this JumpStart Challenge, we’re looking for people to brainstorm and develop a new solution that would enhance our capabilities to spot performance issues with ease.

Currently, Performance Measurement & Analysis (PMA) team and the Runtimes test team collaborate to build new tools and infrastructure to adapt to the changing requirements of users and to the open-source development concepts. These tools such as PerfNext and Test Result Summary Service (TRSS) have been pushed to the open Adopt OpenJDK repo: https://github.com/AdoptOpenJDK/openjdk-test-tools. Under TRSS, we have a dashboard for displaying performance results from daily runs. While this dashboard has some basic functionality of displaying numbers, we could add new features for identifying and monitoring regressions and automating the investigation of these issues. This in turn would improve the efficiency of our performance monitoring and drive faster turnaround as issues are detected.

Please describe the business problem your customers (e.g. external clients, internal team, etc.) are experiencing OR the improvement/opportunity that could be brought to them.

Our PMA team manages the performance monitoring and problem investigation for Eclipse OpenJ9 (https://www.eclipse.org/openj9/) and Java releases from AdoptOpenJDK (https://adoptopenjdk.net/) on all supported hardware platforms. We are also responsible for publishing official performance scores for each Java release. Since performance is of paramount importance to the Java customers, we strive to evaluate and boost our performance on various workloads by running our latest releases and development builds against different benchmarks and by identifying opportunities for compiler optimizations.

Due to large number of benchmark variants and platforms, it’s challenging to identify performance regressions and gains. Currently, we’ve relied on a tool called Traffic Lights that helps in displaying the results from the performance runs. This tool is old and not very flexible and lacks performance monitoring abilities. As a result, we need to develop a new solution that would enhance our capabilities to spot performance issues with ease.

Developers need to know quickly whether their changes cause performance regressions. The sooner this is discovered, the ‘cheaper’ it is to correct and fix the code that introduced the regression. Developers depend on PMA team to run benchmarks, measure and analyze performance results. The PMA team is understaffed and cannot possibly keep up with the growing number of requests from dev team. An effort has begun to make it MUCH easier for developers to run benchmarks and analyze results themselves. Easy-to-use tools empower developers, making them more autonomous and our projects more agile.

What is the key issue (customer pain) or benefit that motivates the need for this project?

Key issue: Performance testing is hard and not standardized, making it difficult for developers.
Key benefit: With easier tooling and approaches, we ‘crowd-source’ the task of performance measurement, empower the development team and make projects more agile.

We have some features we already want to see incorporated which we understand are common tasks manually done by developers. Some of these include use of profiling tools, and looking at additional inputs (such as JIT or GC logs to gather and correlate more data for problem determination).

Better data visualization of results is also an area of great interest. Here is the data we gathered, what is the most compelling way to represent it, to that its quickly communicated and shared with interested parties.

We need to brainstorm the features that need to be added to TRSS and then choose and implement the ones that would provide most benefit to all developers. Currently, PMA team members would be required to look at the graphs and carry out further investigations by launching some more runs and identifying the commit that might be responsible for the regression.

Developing these new tools would benefit everyone since we’ll be able to triage new regressions more easily. Having automated monitoring abilities would significantly reduce PMA team’s workload, allowing it to go deeper into the code issues and to help developers to resolve issues faster.

How might the results of the project be used after the Challenge?

Results of this challenge would be reviewed and potentially incorporated into our live tools.

What are the key technical and business goals?

Technical: Design and develop new features that would help in identifying and investigating Java performance regressions with ease

Business: Display performance results and identify regressions such that the PMA team can improve efficiency while scaling up on Eclipse OpenJ9 performance coverage. Easily articulate the benefits of our products to potential customers.

What specialized skills might be beneficial for the project?

piyush286 commented 5 years ago

Some Proposed Features