force:apex:test:run calculating codecoverage when not requested 7.115.1

jajimenezffdc commented 3 years ago

Summary

We have just updated from 7.75 to 7.115.1, after having fixed a previous issue in force:apex:test:run not running async (https://github.com/forcedotcom/salesforcedx-vscode/issues/3164)

A lot of things added and fixed since then, but we are still finding issues with the test:run command. Taking into account that we are keeping the same code in all of our tests, we've found the following errors in the "Apex Test Execution" section in the org:

Lots of "Code coverage from running this test class was not applied due to a conflicting recompilation task running at the same time. Please rerun this test class to recalculate code coverage." messages on successful tests BUT we are not using the --codecoverage option
Huge increase in our CI pipelines times at that specific stage. It looks like caused by that codecoverage calculation
Some errors like "Could not run test on class XXXX because: connection was cancelled here"

None of these problems was in place in 7.75 (I guess because the codecoverage was introduced in 7.96.1) but the point is that we are not interested in that coverage data and it looks like we can't avoid its calculation.

Steps To Reproduce:

Can't provide the exact steps to recreate as it depends on the tests code, so probably with some basic test you won't get those messages for timeouts and conflicts, but this is how we have tested it.

Create a scratch-org
Ensure you have enableCompileOnDeploy = true
Push your code (tests included)
Run: sfdx force:apex:test:run -u XXXXXXX -r human -w 60 -l RunLocalTests --verbose
Open your org (web ui) and go to Setup > Apex Test Execution
Measure the time it takes to execute the test
Check status of the tests

Expected result

This is the expected behaviour and what is actually happening if you run on 7.75.0

All the tests are executed successfully

Actual result

This is the actual result running 7.115.1

Execution time way longer than usual
Warnings: "Code coverage from running this test class was not applied due..."
Errors for classes timing out.

Additional information

We have isolated the issue in the test:run command after discarding a push issue. We prepared an org, pushed the code and then, over the same org, we ran the test:run comnand on 7.75, upgrade to 7.115, test again and downgrade to 7.75 again. It always worked on 7.75 and always the same issues on 7.11.5

We have tried sync and async runs, several orgs, several products code, but the pattern is the same for all of them.

SFDX CLI Version: @oclif/plugin-autocomplete 0.3.0 (core) @oclif/plugin-commands 1.3.0 (core) @oclif/plugin-help 3.2.3 (core) @oclif/plugin-not-found 1.2.4 (core) @oclif/plugin-plugins 1.10.1 (core) @oclif/plugin-update 1.4.0-3 (core) @oclif/plugin-warn-if-update-available 1.7.0 (core) @oclif/plugin-which 1.0.3 (core) @salesforce/sfdx-plugin-lwc-test 0.1.7 (core) @salesforce/sfdx-trust 3.6.0 (core) alias 1.1.10 (core) apex 0.2.7 (core) auth 1.7.1 (core) config 1.2.24 (core) custom-metadata 1.0.12 (core) data 0.6.0 (core) generator 1.2.0 (core) limits 1.2.1 (core) org 1.7.0 (core) salesforce-alm 52.2.6 (core) schema 1.0.8 (core) sfdx-cli 7.115.1 (core) source 1.0.10 (core) telemetry 1.2.3 (core) templates 52.1.0 (core) user 1.4.0 (core)

OS and version: Both MAC and Linux

AnanyaJha commented 3 years ago

Hi @jajimenezffdc thanks for reporting this! Do you mind sharing some more info with us - where exactly are you seeing the message Code coverage from running this test class was not applied due.. is this in the terminal? Or in the Apex Test Execution UI? When you were running the command with CLI version 7.75, what was the version of the Apex plugin? You can find this by running sfdx plugins --core

jajimenezffdc commented 3 years ago

Hi, @AnanyaJha . That message can be seen only in the UI, not shown in the terminal.

These are the versions on 7.75:

@ffdev/ffdx 3.6.0 (3.6.0) @oclif/plugin-autocomplete 0.1.5 (core) @oclif/plugin-commands 1.3.0 (core) @oclif/plugin-help 3.2.0 (core) @oclif/plugin-not-found 1.2.4 (core) @oclif/plugin-plugins 1.9.0 (core) @oclif/plugin-update 1.3.10 (core) @oclif/plugin-warn-if-update-available 1.7.0 (core) @oclif/plugin-which 1.0.3 (core) @salesforce/sfdx-trust 3.4.3 (core) alias 1.1.2 (core) analytics 1.12.1 (core) config 1.1.8 (core) generator 1.1.3 (core) salesforcedx 49.13.0 (core) ├─ salesforce-alm 49.12.1 (core) ├─ templates 49.10.0 (core) ├─ @salesforce/sfdx-plugin-lwc-test 0.1.7 (core) ├─ custom-metadata 1.0.10 (core) └─ apex 0.1.0 (core) sfdx-cli 7.75.0 (core)

AnanyaJha commented 2 years ago

@jajimenezffdc can you provide us with a sample repository to repro this with? unfortunately, we haven't been able to reproduce this error, so we'll need to be able to do that before we can debug further. could you also try running a smaller number of tests? what happens in the test execution page when there's only a few tests being run?

jajimenezffdc commented 2 years ago

Hi @AnanyaJha

We know it's not easy to recreate outside our company. This is happening with large code products, so makes no sense to provide a simple example, it won't fail. We have smaller products where we have not that issue. Only happen on products with a large number of tests.

I'm happy to help as much as I can if you tell me what's needed, have a call and show you, or any other thing you consider useful, but we'll need your guidance here.

Regards, Jose

jajimenezffdc commented 2 years ago

Hi @AnanyaJha

We have tried to restrict a bit more our scenario, running only a few tests. We are not getting timeouts, but definitely, it makes a difference on which version you run it.

sfdx force:apex:test:run -t PricebooksSelectorTest.selectAllActiveOrderByIsStandardAndName_withActivePricebooks_returnsActivePricebookList -y

sfdx-cli 7.75 723ms 398ms 387ms 382ms 352ms

sfdx-cli 7.115 (or above) 1991ms 1285ms 796ms 1207ms 692ms

1st run is always slowest, but the new CLI is taking much longer.

Can you try doing the same with your test repository, and compare times between versions?

Thanks. Jose

jajimenezffdc commented 2 years ago

Hi @AnanyaJha

I've granted you access to a gDrive folder with some data about the issue, examples, logs, and times. Hope that can help. You should receive an email with the invitation.

Thanks Jose

AnanyaJha commented 2 years ago

@jajimenezffdc thank you so much for compiling the data and sending it over! we weren't able to repro this locally with large projects but will start digging through your data to see if we can narrow down the issue you're facing.

jajimenezffdc commented 2 years ago

Thanks to you for your time @AnanyaJha . If still not a clue after reviewing the logs, we could find a way of sharing our code so you can recreate exactly what that spreadsheet shows. Obviously something on a private channel.

jhadfieldcertinia commented 2 years ago

Hi @AnanyaJha,

I've been doing some more investigation into this on our side, and I believe that I've found some behaviour changes between the two CLI versions that can be replicated even with small or quick-running tests.

The first is the increase in test execution time - running a test as sfdx force:apex:test:run -y -t <testClassName> --json, the results contain a 'testExecutionTime' property. I've found this to be significantly higher when running on version 7.115 of the CLI compared to 7.75. For 500 executions of the same test class, the testExecutionTime on v7.75 averaged 362 milliseconds but on v7.115 the average testExecutionTime was 10183 milliseconds.

The other is evidence of code coverage being calculated - before running the test, I cleared code coverage information by going to Setup > Apex Test History and following the "View Test History" link, then clicking "Clear Test Data". I confirmed that code coverage had been deleted by running the following SOQL query through the Tooling API, which returned 0 results: SELECT ApexTestClassId, TestMethodName, ApexClassorTriggerId FROM ApexCodeCoverage I then ran a test using sfdx force:apex:test:run -y -t <testClassName> --json and ran the same SOQL query to see whether any ApexCodeCoverage records had been created. When running the test using v7.75 of the CLI, none were created; when running the test using v7.115, it created several hundred records. After that, I experimented with turning on the "Store Only Aggregated Code Coverage" option in the org, this resulted in no ApexCodeCoverage records being created on either version of the CLI. With this option turned on, running a large number of tests on v7.115 progresses quite a bit faster and doesn't produce as many timeout errors as it does when running with this option turned off, although the performance is still worse than on v7.75.

I hope this may be helpful in investigating this issue.

randi274 commented 2 years ago

Thanks @jrahadfield this narrowing of the issue has been extremely helpful! I've spent the better part of the morning working through this, and was able to reproduce the issue and track it down to the specific version that has the issue, which corresponds to a major rewrite of how we run our apex tests. You should be able to get up to v7.94.3 of the CLI with the old performance; v7.95.0 is where it takes a major hit. I'm still digging a bit further to see why the code coverage is being triggered, but unfortunately when I also tested adding the -c code coverage flag using the old version of the CLI, it was still much faster than the new version without that flag. Since it was such a large rewrite with that cutover to the new code (which was also refactored for readability, clearer business logic, and testing reasons), I'm not sure we'll be able to recuperate all of the performance degradation, but we've made much better testing inroads so far. Will keep you updated if we find anything else helpful!

uip-robot-zz commented 2 years ago

This issue has been linked to a new work item: W-10221888

sfrycertinia commented 2 years ago

@randi274 We have now tested this with a patch. We ensure that the skipCodeCoverage flag is set in the payload sent to the tooling API test method.

So for example in this file https://github.com/forcedotcom/salesforcedx-apex/blob/232657305fbc10c220399a77d6a865a23cdc53f3/packages/plugin-apex/src/commands/force/apex/test/run.ts

Line 178 would become

payload.skipCodeCoverage = true

This seems to improve the test times for us. test-times

ncrebbincertinia commented 2 years ago

@randi274 Did you get a chance to look at the testing we have done and the patch applied by @sfryffdc ? It would be really helpful if you could review this with a view to including in a release. Any timescales for this would be really useful too. We are blocked up upgrading our cli due to this so a quick fix and release would get us moving. Thanks

randi274 commented 2 years ago

@ncrebbinffdc thanks to your team for confirming! I had suspected the same root cause, but it's great to see the data that corresponds to the improvement. This item is in our backlog, and it's up to Product priority in terms of timeline; sorry I can't give any better specifics! If your team has bandwidth, they are also welcome to submit a PR for our review, and we can take a look at that as well.

AnanyaJha commented 2 years ago

Hi @ncrebbinffdc thanks for reaching out on this issue! We'll be prioritizing this item in our next iteration, so it should be addressed within the next month or two - with some leeway for the holiday season in the middle :) We'll be sure to update this thread once we have any more info or the fix has been released!

ncrebbincertinia commented 2 years ago

Thanks for the updates @AnanyaJha and @randi274

ncrebbincertinia commented 2 years ago

@AnanyaJha @randi274 Please can you let me know when this fix will be available. As the fix in the patch is just a one line change I would guess that this would be quick and could be released in the weekly cycle. Is the reason for the delay in fixing due you working on test performance improvements? So, when can we expect the fixand what will it contain. Thanks.

randi274 commented 2 years ago

@ncrebbinffdc thanks for your patience. The fix also involved making sure tests were updated, and that we were putting the fix in the right spot. I have a change ready which I'm testing currently, which if all goes well should be ready for review and merge soon.

randi274 commented 2 years ago

The PR didn't link since the fix is in a different repository, but the fix is now in review here if y'all want to take a look.

ncrebbincertinia commented 2 years ago

Thanks, I appreciate the update @randi274. Would I be pushing it a bit if i asked for a rough idea of a date if all goes well with your testing? i.e. are we looking at early January?

randi274 commented 2 years ago

@ncrebbinffdc so if it doesn't make it in for today's CLI release, it should be ready by the time their automated releases run next week before our holiday code freeze 🙂 Our fix here will just be the codecoverage fix, which may not be able to solve all of your team's latency issues, but we can work to further hone in on them once this fix is in!

ncrebbincertinia commented 2 years ago

That is fantastic @randi274, many thanks

ncrebbincertinia commented 2 years ago

Hi @randi274 Do you know when this will be released? It has not made it into the cli yet

randi274 commented 2 years ago

Hi @ncrebbinffdc this change should have been released a while ago, and I've been meaning to follow up! Are you still seeing tests are run even without the code coverage flag?

no-response[bot] commented 2 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. Currently, there is not enough information provided for us to take action. Please reply and reopen this issue if you need additional assistance.

sfrycertinia commented 2 years ago

Thanks, this is looking good now. 7.136.2 of sfdx-cli which now includes this fix is providing results that are comparable to the previous good ones.

forcedotcom / salesforcedx-vscode