Open monperrus opened 2 years ago
This would come under the abstraction https://github.com/SpoonLabs/sorald/blob/master/src/main/java/sorald/rule/StaticAnalyzer.java
So after some thoughts I believe there are 2 ways to integrate the laughing-train code or qodana in sorald. The fast way is to integrate the current refactorings with their own detection. The more advanced way is using some kind of sarif parser using the qodana sarif output https://www.jetbrains.com/help/qodana/qodana-sarif-output.html. Qodana could be run with https://github.com/JetBrains/qodana-cli only requiring docker installed or requiring the sarif file provided by the user.
In the sarif file we could match the rules by their ID and fix the bad smell with their position. This would allow sorald to fix a lot more bad smells like value always null or other very advanced IntelliJ stuff. (A sarif parser could be https://github.com/Contrast-Security-OSS/java-sarif)
The first approach yields faster results the second needs more time and should fix more bad smells(assuming there is no legal stuff hindering us). Which approach do you prefer?
I'm currently experimenting with the second option, and kinda works. The current results and automatic fixable bad smells for some projects can be found here https://github.com/MartinWitt/laughing-train/tree/gh-mining. The real benefit from using qodana are bad smells like https://github.com/MartinWitt/laughing-train/blob/gh-mining/spoon.md#unnecessary-return, where information flow is necessary to detect.
So after some more experimenting:
More or less simple if we assume a docker host is running. Pull the image if it isn't available local and run it with the given parameters. After the Qodana run is finished, simply parse the SARIF file(there exist some good libs for it). Then convert the sarif rules to Sorald rules. A variant without docker does not exist to my knowledge. If creating an own container wrapping the Qodana container(docker in docker) or simple invocation is more or less a style question.
The Sorald interface for an analyzer is a list of files, some rules and classpath.
Collection<RuleViolation> findViolations(List<File> files, List<Rule> rule, List<String> classpath):
The Qodana interface is more or less
SarifFile analyze(String projectRoot, String pathToQodanaRulesFile, String pathToAnalyze)
As you see, Qodana is project based, Sorald is file based. Qodana needs the whole project and not only some files.
So Qodana requires a new interface, or we simply abuse the contract and say the first file is the project root.
Qodana is slow. This is because Qodana invokes more or less a full build before analyzing. For complex builds like https://github.com/IntellectualSites/FastAsyncWorldEdit this has horrible runtime. I haven't been able to finish the analyse step in 30 minutes for FAWE (on Windows with a gen 12 intel CPU). This could be either an WSL issue or Qodana is that slow. For spoon, it requires around 6 min in the CI. So we have to ask us does Qodana generate that much value, that this runtime is worth it? Has this more than scientific benefit and do we hope JetBrains can improve their runtime.
PS: Interesting question I never really thought about, but is there a comparison how much worse a no-classpath analyze with Spoon is against a more complex like Qodana is using. Most important bad smells should be detectable without a classpath. Is there any research about it?
Thanks a lot for the analysis.
the interface problem.
What about adding Collection<RuleViolation> findViolations(Path projectRoot, List<Rule> rule, List<String> classpath):
? Would you PR this?
how much worse a no-classpath analyze with Spoon is against a more complex like Qodana is using
I'm not aware of such a comparison. But I'm note sure the Spoon no-classpath ls 'less complex' than Qodana's.
What about adding Collection
findViolations(Path projectRoot, List rule, List classpath):? Would you PR this?
I would add this interface after I created a working example for Qodana in Sorald. Otherwise, we have no proof if this fits. I saw Qodana in Sorald is a GSoC task, shall we wait if someone applies to this task?
As the GSoC proposal was rejected, this task is now again a hot topic. We should first define a goal for a first rule we have as initial target. The bad smell should be detected by Qodana and fixed by Sorald. I would propose using the redundant toString()
check. From my current knowledge, this check only exists in sonarlint in C# but not for Java(see https://rules.sonarsource.com/csharp/RSPEC-1858). Furthermore, this bad smell is present in many projects and (should) never gets rejected if you create a PR fixing it. The code for fixing this bad smell is relatively short, and the PR focus keeps on the docker/Qodana integration.
Wdyt?
I agree with @MartinWitt that we should start with fixing a rule which has an easy repair so that we can focus on the docker/Qodana integration.
But before we do that, should we generalise the app using Java SPI as you have suggested here? It does sound like a good idea.
This is the million-dollar question. It depends on the research scope, goal and future vision of the project. Are there any actual plans to integrate more analyzers into Sorald? Is this a long living project or paper and maintenance mode? Adding this new abstractions costs time with no direct benefit but would boost future development. Adding it later costs more time because it is more complex, but you have intermediate results. As it is hard for me to answer questions about your research, it's your choice. Depending on your other answers, we can choose the best option.
Edit: And we should clarify first if we can hard break with the current Sorald design and classes. Adding SPI is 100% a breaking API change.
Is this a long living project or paper and maintenance mode? Adding this new abstractions costs time with no direct benefit but would boost future development. Adding it later costs more time because it is more complex, but you have intermediate results
I think @monperrus can weigh in better on this. My opinion was to generalise the app first so that it is easier to integrate other static analyzers (as we have claimed in the paper). Moreover, we do not have any set deadlines as of now that we have to integrate Qodana. So if we plan to integrate another static analyzer, we do it in the most generic way possible.
And we should clarify first if we can hard break with the current Sorald design and classes
I was trying that, but I was having trouble designing the interfaces. I have summarised the gist over here. As a short term goal, I have decided to remove Checks.java
because the map inside it is annoying me. See https://github.com/SpoonLabs/sorald/issues/722.
Is this a long living project or paper and maintenance mode?
We don't know yet. For now, the highest priority is to get the the paper formally published, and we're very close to this (minor revision).
@monperrus @MartinWitt We realised in the meeting with SonarSource that we were a bit late in presenting our work because they had already started implementing code fixes. To my knowledge, I don't think so there is a repair tool by JetBrains to fix Qodana warnings. We could probably keep that on priority.
To my knowledge, I don't think so there is a repair tool by JetBrains to fix Qodana warnings.
The repair tool is more or less IntelliJ. But you are correct, there is no headless/non-ide version.
The repair tool is more or less IntelliJ.
Yes, I agree. They have techniques to perform quick fixes, but not specifically for Qodana violations. This paper here cites two repair tools built as IntelliJ plugins. However, they focus on some code smells. I am not sure if they are dictated by Qodana or not.
I'm currently testing the SPI refactoring inside my fork of the repo. It seems like the first goal to refactor Sorald as a platform for analyzers and repair is to use SPI for sonar-java and break the monolithic design.
This paper here cites two repair tools built as IntelliJ plugins. However, they focus on some code smells. I am not sure if they are dictated by Qodana or not.
From my short look at it, we would differentiate from them in the assumptions. They assume the usage of IntelliJ, and we assume the existence of a static analyzer. But I will read the paper closer and see how far they are.
differentiate from them in the assumptions.
That makes sense. I looked up if there are any repair tools for Qodana violations and it led me to this. I didn't found any more tools so I conveniently assumed that these are the only tools as of now.
So as I'm making progress, let's discuss the SPI idea here. Refactoring to SPI seems essential before we can add Qodana in add non-intrusive way.
I would propose 2 new SPI interfaces ProcessorSupplier
[1] and RuleProvider
[2] and an interface for RuleType
named IRuleType
. The SPI interfaces are used for loading processors[6] and rules[7]. The IRuleType
abstracts the enum for bad smell types for extension reasons and allows adding new types.
Currently, SonarLint has a constructor with rootProject as parameter instead of a method parameter. SPI only allows non-arg constructors, and the projectRoot is needed in Qodana too. So, we simply add it to the analysis method [3].
In [4] it is shown. We simply ask java to load all classes in the classpath implementing StaticAnalyzer
and instantiate them. To get all available rules, we do more or less the same [5].
With the SPI usage, we can move all sonar related code to a subproject. Also, we move the API to another subproject. This decouples sorald from sonarlint, enables more analyzers(adding them is adding a new dependency) and should improve the maintainability of the code.
— A lot of source file movement because we create new subprojects. — Breaking the StaticAnalyzer API(is there any production code using the StaticAnalyzer API?). — We have more or less created the possibility of a (small) software product line with the overhead but are uncertain if we really need it. Possible SPL products could be for example sorald-sonar, sorald-qodana and sorald-sonar-qodana. This result was never the scope of adding Qodana.
[1] https://github.com/MartinWitt/sorald/blob/b24c84738fbad5d344fd2d4a083b91d6998f8eb2/sorald/src/main/java/sorald/ProcessorSupplier.java#L14 [2] https://github.com/MartinWitt/sorald/blob/b24c84738fbad5d344fd2d4a083b91d6998f8eb2/sorald/src/main/java/sorald/rule/RuleProvider.java#L5 [3] https://github.com/MartinWitt/sorald/blob/b24c84738fbad5d344fd2d4a083b91d6998f8eb2/sorald/src/main/java/sorald/rule/StaticAnalyzer.java#L18 [4] https://github.com/MartinWitt/sorald/blob/b24c84738fbad5d344fd2d4a083b91d6998f8eb2/sorald/src/main/java/sorald/sonar/ProjectScanner.java#L66 [5] https://github.com/MartinWitt/sorald/blob/b24c84738fbad5d344fd2d4a083b91d6998f8eb2/sorald/src/main/java/sorald/rule/Rules.java#L20 [6] https://github.com/MartinWitt/sorald/blob/b24c84738fbad5d344fd2d4a083b91d6998f8eb2/sorald/src/main/java/sorald/Processors.java#L72 [7] https://github.com/MartinWitt/sorald/blob/b24c84738fbad5d344fd2d4a083b91d6998f8eb2/sorald/src/main/java/sorald/rule/Rules.java#L20
Thanks for your work! I had some comments.
In [4] it is shown. We simply ask java to load all classes in the classpath implementing StaticAnalyzer
Do we load all classes or the one specified? In my head, I was imagining that in future a user will provide the name of the static analyzer through CLI and we only mine its violation. In that case, we do not need to load all classes. What do you think about this?
Breaking the StaticAnalyzer API(is there any production code using the StaticAnalyzer API?).
Yes. The StaticAnalyzer
is implemented by SonarStaticAnalyzer
so we will have to reconfigure it a bit to accept File projectRoot
now.
We have more or less created the possibility of a (small) software product line with the overhead but are uncertain if we really need it. Possible SPL products could be for example sorald-sonar, sorald-qodana and sorald-sonar-qodana. This result was never the scope of adding Qodana.
That depends a lot upon the dissemination of Sorald.
We had a meeting with SonarSource recently and they said that they started working on the quick fixes in 2021. As of now, it seemed that they focussed more on code smells. I also looked at their code for transformation and it seemed like they use text edits for modifying the content. You can have a look at how they build a quick fix. Based on intuition, I think we have a more powerful way of transformation and can implement rules with more complicated fixes. For example, we might perform better at fixes where chunks of code have to be inserted and deleted.
You can also have a look at the list of rules they currently have a quick fix for. covered
, infeasible
, partial
, and unknown
are the enums defined by them. I put undefined
where the quickfix
attribute was not present. You can see these statuses defined in specifications like these.
For Qodana warnings, there does not seem to exist a repair tool yet. Thus, it makes sense if we push this further. However, I do agree that applying repairs using a CLI is not the best way. An IDE extension is much better for this problem.
Do we load all classes or the one specified? In my head, I was imagining that in future a user will provide the name of the static analyzer through CLI and we only mine its violation. In that case, we do not need to load all classes. What do you think about this?
We can easily load analyzers by name, but in the first draft I attempted to keep it simple and don't add more methods than needed.
We had a meeting with SonarSource recently and they said that they started working on the quick fixes in 2021.[...]
Looking at their implementation of refactorings, I'm surprised how bad they solve refactorings. Using String replacement instead of an AST seems like a bad student thesis and not a commercial product. I would be surprised if they can ever implement complex refactorings as IntelliJ/JetBrains level with String manipulation.
However, I do agree that applying repairs using a CLI is not the best way. An IDE extension is much better for this problem.
But isn't a bot creating PRs for Git(Hub) repos a way less competitive field? An IDE plugin has way higher usability constraints like runtime etc. which are hard to meet with Qodana. And why would you use a plugin solving problems which another product is way better in it? Integrating Sorald in a later step in an IDE should be possible but I'm unsure if this should be a main goal.
wdyt? Do you want to compete in a more competitive field or have more results/reach by created PRs in a less competitive field?
We can easily load analyzers by name, but in the first draft I attempted to keep it simple and don't add more methods than needed.
Sounds good.
like a bad student thesis and not a commercial product
LMAO. :skull: :laughing:
But isn't a bot creating PRs for Git(Hub) repos a way less competitive field?
You are right. Submitting PR instead of an extension is better for multiple reasons.
I wrote "An IDE extension is much better for this problem" just to emphasise that CLI is not the best way to deal with applying quick fixes.
Also, you should merge master in your fork because there has been a major refactoring change. We no longer depend on any internal dependencies of sonar-java. See #751 . Upgrading sonar-java is now as simple as updating the value in this file.
For the record, doing automated PRs with Sorald is already implemented in https://github.com/eclipse/repairnator
As we had some discussion about the architecture, I will link the thread here #776 for documentation purpose.
It'd be great to add support for repairing Qodana bug warnings
Qodana: https://www.jetbrains.com/qodana/
Some processors already exist in https://github.com/MartinWitt/laughing-train by @martinwitt