forcedotcom / cli

Salesforce CLI
https://developer.salesforce.com/docs/atlas.en-us.sfdx_cli_reference.meta/sfdx_cli_reference/
BSD 3-Clause "New" or "Revised" License
494 stars 78 forks source link

Salesforce CLI test run ends really slow / takes really long #2475

Closed rohledermarc closed 10 months ago

rohledermarc commented 1 year ago

Summary

We are using the latest Salesforce CLI to run our test suites in our CI on our sandboxes and production environment. We are running each test suite one by one. For each run we use the CLI command sf apex run test --json --target-org **** --suite-names Test_Suite1 --wait 10 --output-dir test-results/apex --result-format json. Apparently the whole run is taking really long, but if we run all tests by just running sf apex run test --wait 10 it is really quick.

It seems like the tests are ready really fast and the json result is shown, but it has kind of a delay till the command ends and the script gets the exit status of the command.

All ~2000 tests through 24 test suites one by one take around 50 minutes (summarised test execution time around 8 minutes), all ~2000 tests at once take around 7 minutes (test execution time around 8 minutes).

Steps To Reproduce

Our repositories of our packages are private, so we can not share it. To reproduce the behaviour it should be enough to:

  1. create a lot of tests within 24 test suites
  2. run the test suites one by one and write down the time it took (use the script which I add at the bottom of this issue for )
  3. run the tests at once and write down the time it took (using sf apex run test --wait 10) --> compare the times

Alternative (easier):

  1. run sf apex test run and observe how long the command takes to finish and the terminal to be ready again

Expected result

The two compared times should be quite similar and not differ that much.

Actual result

The two compared times differ really much.

System Information

I am using a mac with zsh, but the issue is the same on windows and in our CI, which is using linux.

Salesforce CLI version:

{
  "cliVersion": "@salesforce/cli/2.8.11",
  "architecture": "darwin-arm64",
  "nodeVersion": "node-v20.5.1",
  "osVersion": "Darwin 22.6.0",
  "shell": "zsh",
  "rootPath": "/opt/homebrew/lib/node_modules/@salesforce/cli",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 2.3.8 (core)",
    "@oclif/plugin-commands 2.2.25 (core)",
    "@oclif/plugin-help 5.2.19 (core)",
    "@oclif/plugin-not-found 2.4.1 (core)",
    "@oclif/plugin-plugins 3.4.2 (core)",
    "@oclif/plugin-search 0.0.22 (core)",
    "@oclif/plugin-update 3.2.3 (core)",
    "@oclif/plugin-version 1.3.10 (core)",
    "@oclif/plugin-warn-if-update-available 2.1.0 (core)",
    "@oclif/plugin-which 2.2.32 (core)",
    "@salesforce/cli 2.8.11 (core)",
    "apex 2.3.14 (core)",
    "auth 2.8.16 (core)",
    "data 2.5.8 (core)",
    "deploy-retrieve 1.17.8 (core)",
    "dev 0.7.1 (user)",
    "info 2.6.40 (core)",
    "limits 2.3.33 (core)",
    "login 1.2.29 (core)",
    "marketplace 0.1.3 (core)",
    "org 2.10.6 (core)",
    "packaging 1.17.0 (user)",
    "schema 2.3.25 (core)",
    "settings 1.4.28 (core)",
    "sobject 0.2.6 (core)",
    "source 2.10.33 (core)",
    "telemetry 2.3.1 (core)",
    "templates 55.5.11 (core)",
    "trust 2.6.9 (core)",
    "user 2.3.32 (core)",
    "@tmh-bis-salesforce/tmh-bis-salesforce-cli-plugin 0.5.3 (user)"
  ]
}

Additional information

script for running all tests at once:

#! /bin/bash

username=org_username # set the org username or alias 

echo "Start $(date)"
echo ""

mkdir test-results-once-json

echo "sf apex run test --json --target-org $username --wait 15 --output-dir test-results/apex --result-format json"
sf apex run test --json --target-org $username --wait 15 --output-dir test-results/apex --result-format json > test-results-once-json/test-result.json

echo ""
echo "End $(date)"

result of running the script:

Start Tue Sep 19 12:55:57 CEST 2023

mkdir: test-results-once-json: File exists
sf apex run test --json --target-org *** --wait 15 --output-dir test-results/apex --result-format json

End Tue Sep 19 13:02:03 CEST 2023

script for running the test suites one by one:

#! /bin/bash

username=org_username # set the org username or alias 
testsuites=( TestSuite1 TestSuite2 TestSuite3 TestSuite4 TestSuite5 ... ) # fill in the test suite names

echo "Start $(date)"
echo ""

echo "The following test suites will be executed:"
for value in "${testsuites[@]}"
do
  echo "$value"
done

mkdir test-results-json
for value in "${testsuites[@]}"
do
  echo ""
  echo "sf apex run test --json --target-org $username --suite-names $value --wait 15 --output-dir test-results/apex --result-format json"
  sf apex run test --json --target-org $username --suite-names $value --wait 15 --output-dir test-results/apex --result-format json > test-results-json/$value.json
done

echo ""
echo "End $(date)"

result of running the script

Start Tue Sep 19 11:11:54 CEST 2023

The following test suites will be executed:
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

sf apex run test --json --target-org *** --suite-names *** --wait 15 --output-dir test-results/apex --result-format json

End Tue Sep 19 12:02:40 CEST 2023
github-actions[bot] commented 1 year ago

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

mshanemc commented 1 year ago

Tests, using testSuites, are run asynchronous. So if you have 24 suites run in 50 minutes, that's about 2 min/each.

My theory is that each test suite you enqueue is subject to whatever else is going on the in the queue (deploys, retrieves, dashboards, scheduled apex/flow, etc). So it doesn't take that long to actually run, but it might spend a minute waiting in line. And that can vary depending on what type of load the instance is on.

The test runner is using the streaming API, so it's not like it's polling every 30 seconds or something that could be shortened.

I'm going to close this because I don't think there's a bug, and I'm 100% sure it's not related to the CLI.

rohledermarc commented 1 year ago

@mshanemc thank you for the quick answer. It does not seem like it takes so long because it needs to be enqueued. As I am running the tests, I see really quickly the test results, but the command does not finish / exit. This is the time which leads then to the big delay when running so many test suites. If you are interested and not able to reproduce it, I can try to record a video showing the time period which I mean.

j-schreiber commented 1 year ago

I'm going to close this because I don't think there's a bug, and I'm 100% sure it's not related to the CLI.

I am 100% sure it is related to the CLI. The tests are enqueued in seconds, and usually complete within a couple of seconds. Yet, the CLI needs minutes until it gives back control to the terminal. This used to be different with the old sfdx bundle.

Please consider reopening this issue. I would be available to help you reproduce it and confirm, that it is indeed a CLI problem.

mshanemc commented 1 year ago

OK, hit us with a video of this, and preferably the debug logs that show what's happening.

j-schreiber commented 1 year ago

Just made a short video to demonstrate. I was not able to generate meaningful debug logs (because the tests itself are not the problem, Apex Debug logs of the test execution are useless).

Here's a YouTube link of me demonstrating and trying to explain a little bit. Hope it is helpful: https://youtu.be/GyWwiyZ5TJs

Also, some corrections:

If you need more information, please show me how I can provide them.

iowillhoit commented 1 year ago

Thanks for the video @j-schreiber, I am going to try to replicate this today and tomorrow and see if we can find out why this is hanging

iowillhoit commented 1 year ago

Hey @j-schreiber and @rohledermarc, I am not able to reproduce this. I created a simple script that will generate a lot of tests. I then ran sf apex run test and watched the progress with sf apex tail log and in the org UI. As soon as the results were displayed I was given control of my terminal back.

Here is a trimmed video of running 6011 tests on ~2000 files https://github.com/forcedotcom/cli/assets/1715111/4a275153-e4b1-4c9c-9561-c6e4786a05cc

Are you able to reproduce this issue using my repo? See readme for instructions. https://github.com/iowillhoit/dreamhouse-lots-o-tests

If you can, then something is off with your setup. If you cannot, we are going to need you to create a repo that consistently reproduces this issue for us to debug further. Thank you

iowillhoit commented 1 year ago

FWIW: @mshanemc reminded me of ~this issue~ that had a similar issue "hanging" on certain Node versions. I did test using version 20.5.1 and saw the same (quick) results I shared above.

EDIT: Whoops, the link I included was to this same issue. I meant to add https://github.com/forcedotcom/cli/issues/2092

rohledermarc commented 1 year ago

Thanks for investigating @iowillhoit ! I will take a look to it today or in the next days.

rohledermarc commented 1 year ago

I was able now to take a deeper look into your comments, I really appreciate the work you put in and the way you try to reproduce the behaviour!

I first used your repository to create a scratch org (with a fresh trailhead playground dev hub) to see if my CLI behaves the same way as yours did. As it is in your video the CLI is immediately ready after showing the test results. I was confused and used the same terminal and changed nothing but ran tests on our instance (sandbox) -> again the CLI took around two minutes to get ready after showing the results as we described in our comments before.

After a little bit of trial and error I was able to kind of reproduce it: I created a scratch org with your repository but this time with our production dev hub. I ran the same test command as I did for the scratch org created from the fresh trailhead playground dev hub and the CLI took again around two minutes to be ready.

So it seems to be affected by the underlaying instance, but why is the CLI behaving differently? Would it be helpful to grant you access to the scratch org, so you are able to investigate it?

      Script and results:

The script I ran (I removed the echo "" lines for better readability):

#! /bin/bash

echo "Start $(date)"

echo "sf apex run test --target-org dreamhouse --wait 15 -t GeocodingService1Test"
sf apex run test --target-org dreamhouse --wait 15 -t GeocodingService1Test 

echo "End $(date)"

echo "Start $(date)"

echo "sf apex run test --target-org dreamhouseTMH --wait 15 -t GeocodingService1Test"
sf apex run test --target-org dreamhouseTMH --wait 15 -t GeocodingService1Test 

echo "End $(date)"

The result of the script above (I ran it multiple times and the times were always similar). The dreamhouse alias is the scratch org created from the fresh trailhead playground dev hub and the dreamhouseTMH is the scratch org created from our production org dev hub. Both scratch orgs are created with your scratch org definition file and only got your repository deployed:

Start Tue Oct 24 13:53:01 CEST 2023

sf apex run test --target-org dreamhouse --wait 15 -t GeocodingService1Test

=== Test Summary
NAME                 VALUE                        
───────────────────  ─────────────────────────────
Outcome              Passed                       
Tests Ran            3                            
Pass Rate            100%                         
Fail Rate            0%                           
Skip Rate            0%                           
Test Run Id          7077Z00005hNptC              
Test Execution Time  31 ms                        
Org Id               00D7Z0000005IC8UAM           
Username             test-bb66d5ouxu04@example.com

=== Test Results
TEST NAME                              OUTCOME  MESSAGE  RUNTIME (MS)
─────────────────────────────────────  ───────  ───────  ────────────
GeocodingService1Test.blankAddress     Pass              18          
GeocodingService1Test.errorResponse    Pass              6           
GeocodingService1Test.successResponse  Pass              7           

End Tue Oct 24 13:54:30 CEST 2023

Start Tue Oct 24 13:54:30 CEST 2023

sf apex run test --target-org dreamhouseTMH --wait 15 -t GeocodingService1Test

=== Test Summary
NAME                 VALUE                        
───────────────────  ─────────────────────────────
Outcome              Passed                       
Tests Ran            3                            
Pass Rate            100%                         
Fail Rate            0%                           
Skip Rate            0%                           
Test Run Id          7079Z000008ZJzH              
Test Execution Time  25 ms                        
Org Id               00D9Z000000OJPhUAO           
Username             test-h0f72knodzgg@example.com

=== Test Results
TEST NAME                              OUTCOME  MESSAGE  RUNTIME (MS)
─────────────────────────────────────  ───────  ───────  ────────────
GeocodingService1Test.blankAddress     Pass              12          
GeocodingService1Test.errorResponse    Pass              6           
GeocodingService1Test.successResponse  Pass              7           

End Tue Oct 24 13:56:32 CEST 2023
j-schreiber commented 1 year ago

Thanks to everyone for this awesome investigation and attempts at reproducing this! That bringt a lot of clarification. Really appreciate it.

Could this be related to Hyperforce? Our production/DevHub is on DEU5 - I assume playgrounds and personal developer editions are on other pods. Who should we talk to if it turns out to be our location?

I am still not 100% convinced that this is not the CLI (at least partly). If I run tests from the VS Code UI (using the "Run All Tests" overlay in a test class), results are immediately returned.

rohledermarc commented 1 year ago

@iowillhoit I forgot to mention you, so just in case you did not see it yet...

eburtness commented 1 year ago

I'm seeing sf return slowly when running just one test. Possibly the same issue and easier to reproduce. --synchronous returns quickly, but for certain orgs --wait and the default async mode always take ~2 minutes before sf exits.

With --wait I see the test results within 10-15 seconds but sf doesn't exit until the 2 minutes is up. Using time to see how long sf is running, it's a very consistent 2:02.68 – 2:02.70 even as the time for the results to come back varies by a few seconds. That makes me think it's some timer started when the command is run, not when the tests finish.

So far I can only reproduce it on Hyperforce instances, but not all Hyperforce instances, so maybe it's instance or org dependent?

This is with @salesforce/cli/2.15.9 darwin-arm64 node-v18.15.0

To simplify the tables below, I left out the --tests argument, but I'm running just one very simple test.

On a Developer sandbox on Hyperforce USA20S:

Command Test Execution Time Time for sf to finish
sf apex run test --synchronous 11 ms 12.5 s
sf apex run test 17 ms 122.7 s
sf apex run test --wait 10 5 ms 122.7 s

On a Full sandbox on Hyperforce USA32S:

Command Test Execution Time Time for sf to finish
sf apex run test --synchronous 2949 ms 7.5 s
sf apex run test 4 ms 4.5 s
sf apex run test --wait 10 6 ms 10.5 s
eburtness commented 1 year ago

Ah, this is interesting — when I tail the sf log, this is the last two lines before it exits:

{"level":20,"time":1699449216458,"name":"sf:TelemetryReporter","msg":"Testing connection to https://dc.services.visualstudio.com with timeout of 1000 ms"}
{"level":40,"time":1699449216581,"name":"sf:TelemetryReporter","msg":"Connection to https://dc.services.visualstudio.com timed out after 1000 ms"}

Update: Oh sorry, I was thinking maybe that accounted for the delay, but it's only 1 second. 😆 Nevermind.

Nothing in the log suggests an explanation to me, but I'm happy to provide the log if there's a way I can submit it privately. Also happy to provide access to a sandbox or scratch org where this happens.

github-actions[bot] commented 1 year ago

This issue has not received a response in 7 days. It will auto-close in 7 days unless a response is posted.

rohledermarc commented 1 year ago

@iowillhoit @mshanemc Did you have the chance to take another look? It seems like we are not the only ones who have these issues. The workaround with the --synchronous flag does not work for us as we use the --suite-names flag and they do not work together. At least the issue should not be closed, as it not resolved yet.

eburtness commented 1 year ago

I tested again with @salesforce/cli/2.17.14 darwin-arm64 node-v20.9.0; seeing the same results.

rohledermarc commented 1 year ago

Just to make sure, that the issue won't be closed through the weekend... issue is still there also for me with the same version as @eburtness @salesforce/cli/2.17.14 darwin-arm64 node-v20.9.0

github-actions[bot] commented 1 year ago

This issue has not received a response in 7 days. It will auto-close in 7 days unless a response is posted.

eburtness commented 12 months ago

Bump to keep open — same results on @salesforce/cli/2.19.8 darwin-arm64 node-v20.9.0

rohledermarc commented 11 months ago

@eburtness do you still have the issue? today our pipeline ran really fast and the issue seems to be gone...

rohledermarc commented 11 months ago

ok, on our sandboxes the issue seems to be resolved, but on our production environment the issue still exists...

rohledermarc commented 11 months ago

As additional information, our sandboxes are on Winter '24 Patch 16.8 and our production org is on Winter '24 Patch 16.7. Let's see when the patch will be installed on production and if it will resolve the issue.

eburtness commented 11 months ago

@rohledermarc Sorry I haven't been able to test. We're currently having another problem with our sandboxes where many or all async operations (including test runs and metadata API deploys and retrieves) are taking hours or timing out entirely. It's good times haha

github-actions[bot] commented 11 months ago

This issue has not received a response in 7 days. It will auto-close in 7 days unless a response is posted.

j-schreiber commented 10 months ago

Update:

shetzel commented 10 months ago

I've read every post in this issue thread and watched the video. Especially with the more recent posts about tests finishing quickly with server side changes (new patch releases, etc.) this doesn't appear to be a CLI issue. There is only 1 thing written in this thread that makes me suspect client side code, which is from j-schreiber: "If I run tests from the VS Code UI (using the "Run All Tests" overlay in a test class), results are immediately returned." If someone can confirm this and get exact versions of both VS Code and CLI that demonstrate this against the same org, then there could be a case for a bug in plugin-apex. Without that, we're missing a way to reproduce this and debug the client side code, so we are closing this issue. If any new information arises that indicates a problem with the client side code please re-open this.

If you'd like to try to debug this in your environment:

Here are some steps to debug CLI commands. The instructions are specific to the auth:web:login command but you can generalize to any command. In your shell: export NODE_OPTIONS=--inspect-brk Open a chrome tab to: chrome://inspect (alternatively, use your preferred JS debugger and that config) run the auth:web:login command In the chrome debugger tab, click the “inspect” link. It should stop in the main entry point and give you some clues as to where the files are to put some breakpoints. Stepping through at this level will not be very productive so I would look at the file location and use it to search for a more accurate location to put breakpoints. E.g., in my case, I have an npm install of the CLI so I added debugger; to line 147 of webOAuthServer.js (the executeOauthRequest function) at this location: ~/.nvm/versions/node/v16.15.1/lib/node_modules/sfdx-cli/node_modules/@salesforce/core/lib/webOAuthServer.js. Then when the debugger stops at the initial entry point, resume the script and it should open a browser tab to your login url and stop at your debugger; statement in webOAuthServer.js. Now you can step through.

Look for a location like: .../@salesforce/plugin-apex/lib/commands/apex/run and debug or add console.log() or debugger; statements.

cinicola commented 10 months ago

I have a similar problem here. When I run the command:

sf apex get test --test-run-id=707Ha00000JennO --target-org myOrg

The response arrives quickly (1 to 3 seconds), but the terminal hangs for another 2 minutes.

Is it a SF CLI problem?

calebkuester commented 8 months ago

Just running some tests on my end. It seems like it's both a VS Code problem and a Salesforce CLI problem.

Whatever command VS Code is using to run individual tests now has new behavior. In a non-hyperforce instance, it runs all of the tests in a single test class instead of just one. In a hyperforce instance, while attempting to run a single test with no content (blank test that does nothing), it will timeout every time (10 minutes and then stop with error).

If you run this command, however: sf apex run test --tests TestClassName.testName -o org-alias-or-username -w 5

-- it behaves exactly as one would expect. It runs quickly and retrieves data quickly.

This problem could be solved in VS Code if the extension: Salesforce CLI Integration were to start using "sf apex run test" instead of whatever it's currently using.

Update: This does not behave the same every time in hyperforce. Sometimes, if you check the Apex Test Execution, you will find your test "queued." If it is queued, it will never be halted and so it won't timeout after ten minutes. It will simply start when it starts.

uwe-voellger commented 6 months ago

The same problem here: Windows, @salesforce/cli/2.38.7 win32-x64 node-v20.12.1 Instance: CHE26S

The same problem exists for sf apex:test:run and sf apex:get:test. I'm using --json output.

We have really huge amount of tests, and when working in sync mode we often get connection timeouts. But when switching to async tests, it seems

What we would like to do in order to deal with connection timeouts is to

But with the bugs mentioned before, there seems to be no way to work with Apex tests in async mode.

ahgittin commented 2 months ago

I've encountered this today, with a tiny number of tests in a fairly clean dev org: the tests finish in 4s all passing, but the CLI waits 2 minutes before exiting. Only occurs in some configurations (eg without named tests), but when it does it always occurs.

Cause:: This is due to an ongoing WS connection attempt in the faye pub/sub library. Faye greedily tries both https and websockets, and for me it uses the former as that comes back first, and the tests run happily. But all the while the websocket attempt is still in flight, and when tests are done and the CLI otherwise finishes, that attempt stops node from returning. It's a nice feature, preferring websockets if they are available, but I don't see an easy way in the code to cancel that in-flight attempt when the pub/sub is closing down. And I don't want to wait the 120s default timeout on my system.

Simple workaround: Tell JS to exit explicitly once tests are done. Edit ~/.local/share/sf/client/2.58.7-ed27925/bin/run.js (or similar), adding this at the end:

process.exit(0);
wesnolte commented 1 month ago

I've just hit this issue with tests being run from CircleCI. The 'Test History' in my Org indicates the test run was completed in 12m09s, however, the CircleCI step that runs the command only exited after 41m28s. My CircleCI step that runs the tests includes an echo to the terminal every 5 minutes to prevent the 10m "no activity" timeout set by CircleCI. I can see that after my test run completed and output the test result to the terminal this command ran 3 more times before the sf apex run test command provided an exit code. Here's the script if it helps (I know I know it's hacky):

echo "Running unit tests..."

(while true; do echo "Still running tests..."; sleep 300; done) &  # Background process that outputs every 5 minutes

./node_modules/.bin/sf apex run test --synchronous --target-org $SFDC_USERNAME --result-format junit --output-dir test-results

EXIT_CODE=$?

kill %1  # Kill the background output process once the tests are complete

exit $EXIT_CODE  # Ensure proper exit with the test exit code

This is costing me quite a lot of time and build minutes!