UnitTestBot / UTBotJava

Automated unit test generation and precise code analysis for Java
Apache License 2.0
133 stars 39 forks source link

Rerun Spring integration tests after minimizations with full context reset #2363

Open IlyaMuravjov opened 1 year ago

IlyaMuravjov commented 1 year ago

Description

Right now we don't fully reset Spring context between concrete executions when generating integration tests with fuzzer, because it's too time consuming (can take several seconds per reset). We do our best to reset relevant parts of context (e.g. reset relevant beans and rollback transactions), however that may still not be enough because, for example, database id generators are not rollbacked with the transaction.

Partial reset of Spring context may lead to generation of unreproducible tests that rely on some code from earlier concrete executions being executed before them.

Action plan

To cope we that, it is suggested to rerun tests that are left after test case minimization with full context reset and use results obtained from these reruns.

alisevych commented 1 year ago

use results obtained from these reruns

@IlyaMuravjov Do you mean to:

Also can you define more exactly: what data from results will be used? Is it used for assertions only?

IlyaMuravjov commented 1 year ago

@alisevych Yes, I mean exactly what you said (tests are generated based on results from reruns only, original results obtained using dirty context are ignored).

The data from results includes UtExecution.stateAfter, UtExecution.result, and UtExecution.coverage, they are used for generating assertions, printing stack traces, and generating SARIF reports, and possibly something else.

alisevych commented 1 year ago

I wanted to point out that executions are made twice. That is time consuming. If run results can be compared and there are the differences, we could make some conclusions from them. You've suggested to just lose this information. cc @EgorkaKulikov

IlyaMuravjov commented 1 year ago

@alisevych

First, I'd like to pay more attention to the fact that rerun on clean context differs a lot from the original run:

If we see the difference in results from runs on clean and dirty context, the only conclusion that we seem to be able to make is that either we have failed to properly reset all relevant parts of context when doing partial reset or the result is indeed non deterministic.

I think, we should only notify the user about non deterministic results after making at least two runs on clean context and observing different results.

Finally, all of the handling differences in run results depends on code generation being ready to deal with multiple unequal results.

alisevych commented 1 year ago

@IlyaMuravjov Thank you! Enjoyed reading.