feat(script): Verify the existence of checker config `doc_url` pages and find appropriate older releases for gone (removed, dealpha, etc.) checkers #4207
The checker label configuration files most often contain a documentation page link that we suggest to the user when viewing the details of a report. These JSON files are always hard-baked into a released package, and the server serves information based on what is available in the deployed image. As all of these links point to external resources, these links are very susceptible to link rot.
For example, suppose that an analysis was stored with the alpha.Foo checker, with the URL pointing to .../alpha/Foo.html. Once the underlying analyser's documentation changes (usually for two reasons: improving the checker and removing it from alpha, or the checker becoming completely removed from upstream!), this link is now dead. Newer reports stored with core.Foo (.../core/Foo.html) will point to a proper documentation, but re-routing alpha.Foo's documentation page to core.Foo's would be an invalid action, as the behaviour of the checker might have changed meanwhile, rendering the contents of the new document inapplicable to the old report! In addition, nothing prevents the user from running an older analyser with/through a newer CodeChecker package, and uploading new results from the alpha. version even when after core. analyser's release.
This patch introduces an opt-in tool which reads the configuration files and verifies whether the URL is available to a hypothetical user. If not, it attempts to employ a heuristic pipeline to attempt a URL that corresponds to the checker with the currently dead link, first by fixing the typos in the URL, and if that is still unsuccessful, trying the documentation sites of older releases. For now, this fixing logic is only implemented for the LLVM-based analysers, Clang SA and Clang-Tidy, as implementing it requires an accurate understanding of the documentation structure of the specific analyser.
4175 was merged so the multiprocessing library loader code has to be altered, as there are no tests in this script that shows that it won't work if it is merged to the current (post-#4175) master...
The checker label configuration files most often contain a documentation page link that we suggest to the user when viewing the details of a report. These JSON files are always hard-baked into a released package, and the server serves information based on what is available in the deployed image. As all of these links point to external resources, these links are very susceptible to link rot.
For example, suppose that an analysis was stored with the
alpha.Foo
checker, with the URL pointing to.../alpha/Foo.html
. Once the underlying analyser's documentation changes (usually for two reasons: improving the checker and removing it from alpha, or the checker becoming completely removed from upstream!), this link is now dead. Newer reports stored withcore.Foo
(.../core/Foo.html
) will point to a proper documentation, but re-routingalpha.Foo
's documentation page tocore.Foo
's would be an invalid action, as the behaviour of the checker might have changed meanwhile, rendering the contents of the new document inapplicable to the old report! In addition, nothing prevents the user from running an older analyser with/through a newer CodeChecker package, and uploading new results from thealpha.
version even when aftercore.
analyser's release.This patch introduces an opt-in tool which reads the configuration files and verifies whether the URL is available to a hypothetical user. If not, it attempts to employ a heuristic pipeline to attempt a URL that corresponds to the checker with the currently dead link, first by fixing the typos in the URL, and if that is still unsuccessful, trying the documentation sites of older releases. For now, this fixing logic is only implemented for the LLVM-based analysers, Clang SA and Clang-Tidy, as implementing it requires an accurate understanding of the documentation structure of the specific analyser.