Closed rohledermarc closed 9 months ago
Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.
Tests, using testSuites, are run asynchronous. So if you have 24 suites run in 50 minutes, that's about 2 min/each.
My theory is that each test suite you enqueue is subject to whatever else is going on the in the queue (deploys, retrieves, dashboards, scheduled apex/flow, etc). So it doesn't take that long to actually run, but it might spend a minute waiting in line. And that can vary depending on what type of load the instance is on.
The test runner is using the streaming API, so it's not like it's polling every 30 seconds or something that could be shortened.
I'm going to close this because I don't think there's a bug, and I'm 100% sure it's not related to the CLI.
@mshanemc thank you for the quick answer. It does not seem like it takes so long because it needs to be enqueued. As I am running the tests, I see really quickly the test results, but the command does not finish / exit. This is the time which leads then to the big delay when running so many test suites. If you are interested and not able to reproduce it, I can try to record a video showing the time period which I mean.
I'm going to close this because I don't think there's a bug, and I'm 100% sure it's not related to the CLI.
I am 100% sure it is related to the CLI. The tests are enqueued in seconds, and usually complete within a couple of seconds. Yet, the CLI needs minutes until it gives back control to the terminal. This used to be different with the old sfdx
bundle.
Please consider reopening this issue. I would be available to help you reproduce it and confirm, that it is indeed a CLI problem.
OK, hit us with a video of this, and preferably the debug logs that show what's happening.
Just made a short video to demonstrate. I was not able to generate meaningful debug logs (because the tests itself are not the problem, Apex Debug logs of the test execution are useless).
Here's a YouTube link of me demonstrating and trying to explain a little bit. Hope it is helpful: https://youtu.be/GyWwiyZ5TJs
Also, some corrections:
@salesforce/cli/2.7.0 darwin-x64 node-v18.15.0
but we have been observing this behaviour for months now.If you need more information, please show me how I can provide them.
Thanks for the video @j-schreiber, I am going to try to replicate this today and tomorrow and see if we can find out why this is hanging
Hey @j-schreiber and @rohledermarc, I am not able to reproduce this. I created a simple script that will generate a lot of tests. I then ran sf apex run test
and watched the progress with sf apex tail log
and in the org UI. As soon as the results were displayed I was given control of my terminal back.
Here is a trimmed video of running 6011 tests on ~2000 files https://github.com/forcedotcom/cli/assets/1715111/4a275153-e4b1-4c9c-9561-c6e4786a05cc
Are you able to reproduce this issue using my repo? See readme for instructions. https://github.com/iowillhoit/dreamhouse-lots-o-tests
If you can, then something is off with your setup. If you cannot, we are going to need you to create a repo that consistently reproduces this issue for us to debug further. Thank you
FWIW: @mshanemc reminded me of ~this issue~ that had a similar issue "hanging" on certain Node versions. I did test using version 20.5.1
and saw the same (quick) results I shared above.
EDIT: Whoops, the link I included was to this same issue. I meant to add https://github.com/forcedotcom/cli/issues/2092
Thanks for investigating @iowillhoit ! I will take a look to it today or in the next days.
I was able now to take a deeper look into your comments, I really appreciate the work you put in and the way you try to reproduce the behaviour!
I first used your repository to create a scratch org (with a fresh trailhead playground dev hub) to see if my CLI behaves the same way as yours did. As it is in your video the CLI is immediately ready after showing the test results. I was confused and used the same terminal and changed nothing but ran tests on our instance (sandbox) -> again the CLI took around two minutes to get ready after showing the results as we described in our comments before.
After a little bit of trial and error I was able to kind of reproduce it: I created a scratch org with your repository but this time with our production dev hub. I ran the same test command as I did for the scratch org created from the fresh trailhead playground dev hub and the CLI took again around two minutes to be ready.
So it seems to be affected by the underlaying instance, but why is the CLI behaving differently? Would it be helpful to grant you access to the scratch org, so you are able to investigate it?
Script and results:
The script I ran (I removed the echo ""
lines for better readability):
#! /bin/bash
echo "Start $(date)"
echo "sf apex run test --target-org dreamhouse --wait 15 -t GeocodingService1Test"
sf apex run test --target-org dreamhouse --wait 15 -t GeocodingService1Test
echo "End $(date)"
echo "Start $(date)"
echo "sf apex run test --target-org dreamhouseTMH --wait 15 -t GeocodingService1Test"
sf apex run test --target-org dreamhouseTMH --wait 15 -t GeocodingService1Test
echo "End $(date)"
The result of the script above (I ran it multiple times and the times were always similar). The dreamhouse alias is the scratch org created from the fresh trailhead playground dev hub and the dreamhouseTMH is the scratch org created from our production org dev hub. Both scratch orgs are created with your scratch org definition file and only got your repository deployed:
Start Tue Oct 24 13:53:01 CEST 2023
sf apex run test --target-org dreamhouse --wait 15 -t GeocodingService1Test
=== Test Summary
NAME VALUE
─────────────────── ─────────────────────────────
Outcome Passed
Tests Ran 3
Pass Rate 100%
Fail Rate 0%
Skip Rate 0%
Test Run Id 7077Z00005hNptC
Test Execution Time 31 ms
Org Id 00D7Z0000005IC8UAM
Username test-bb66d5ouxu04@example.com
=== Test Results
TEST NAME OUTCOME MESSAGE RUNTIME (MS)
───────────────────────────────────── ─────── ─────── ────────────
GeocodingService1Test.blankAddress Pass 18
GeocodingService1Test.errorResponse Pass 6
GeocodingService1Test.successResponse Pass 7
End Tue Oct 24 13:54:30 CEST 2023
Start Tue Oct 24 13:54:30 CEST 2023
sf apex run test --target-org dreamhouseTMH --wait 15 -t GeocodingService1Test
=== Test Summary
NAME VALUE
─────────────────── ─────────────────────────────
Outcome Passed
Tests Ran 3
Pass Rate 100%
Fail Rate 0%
Skip Rate 0%
Test Run Id 7079Z000008ZJzH
Test Execution Time 25 ms
Org Id 00D9Z000000OJPhUAO
Username test-h0f72knodzgg@example.com
=== Test Results
TEST NAME OUTCOME MESSAGE RUNTIME (MS)
───────────────────────────────────── ─────── ─────── ────────────
GeocodingService1Test.blankAddress Pass 12
GeocodingService1Test.errorResponse Pass 6
GeocodingService1Test.successResponse Pass 7
End Tue Oct 24 13:56:32 CEST 2023
Thanks to everyone for this awesome investigation and attempts at reproducing this! That bringt a lot of clarification. Really appreciate it.
Could this be related to Hyperforce? Our production/DevHub is on DEU5 - I assume playgrounds and personal developer editions are on other pods. Who should we talk to if it turns out to be our location?
I am still not 100% convinced that this is not the CLI (at least partly). If I run tests from the VS Code UI (using the "Run All Tests" overlay in a test class), results are immediately returned.
@iowillhoit I forgot to mention you, so just in case you did not see it yet...
I'm seeing sf
return slowly when running just one test. Possibly the same issue and easier to reproduce. --synchronous
returns quickly, but for certain orgs --wait
and the default async mode always take ~2 minutes before sf exits.
With --wait
I see the test results within 10-15 seconds but sf doesn't exit until the 2 minutes is up. Using time
to see how long sf is running, it's a very consistent 2:02.68 – 2:02.70 even as the time for the results to come back varies by a few seconds. That makes me think it's some timer started when the command is run, not when the tests finish.
So far I can only reproduce it on Hyperforce instances, but not all Hyperforce instances, so maybe it's instance or org dependent?
This is with @salesforce/cli/2.15.9 darwin-arm64 node-v18.15.0
To simplify the tables below, I left out the --tests
argument, but I'm running just one very simple test.
Command | Test Execution Time | Time for sf to finish |
---|---|---|
sf apex run test --synchronous |
11 ms | 12.5 s |
sf apex run test |
17 ms | 122.7 s |
sf apex run test --wait 10 |
5 ms | 122.7 s |
sf org create scratch --edition developer --target-dev-hub Prod
where Prod is my production dev hub.@salesforce/cli/2.16.7 linux-x64 node-v20.8.1
instead of my local machine, it's just slightly longer at 122.95s – 122.97sCommand | Test Execution Time | Time for sf to finish |
---|---|---|
sf apex run test --synchronous |
2949 ms | 7.5 s |
sf apex run test |
4 ms | 4.5 s |
sf apex run test --wait 10 |
6 ms | 10.5 s |
Ah, this is interesting — when I tail the sf log, this is the last two lines before it exits:
{"level":20,"time":1699449216458,"name":"sf:TelemetryReporter","msg":"Testing connection to https://dc.services.visualstudio.com with timeout of 1000 ms"}
{"level":40,"time":1699449216581,"name":"sf:TelemetryReporter","msg":"Connection to https://dc.services.visualstudio.com timed out after 1000 ms"}
Update: Oh sorry, I was thinking maybe that accounted for the delay, but it's only 1 second. 😆 Nevermind.
Nothing in the log suggests an explanation to me, but I'm happy to provide the log if there's a way I can submit it privately. Also happy to provide access to a sandbox or scratch org where this happens.
This issue has not received a response in 7 days. It will auto-close in 7 days unless a response is posted.
@iowillhoit @mshanemc Did you have the chance to take another look? It seems like we are not the only ones who have these issues. The workaround with the --synchronous
flag does not work for us as we use the --suite-names
flag and they do not work together. At least the issue should not be closed, as it not resolved yet.
I tested again with @salesforce/cli/2.17.14 darwin-arm64 node-v20.9.0
; seeing the same results.
Just to make sure, that the issue won't be closed through the weekend... issue is still there also for me with the same version as @eburtness @salesforce/cli/2.17.14 darwin-arm64 node-v20.9.0
This issue has not received a response in 7 days. It will auto-close in 7 days unless a response is posted.
Bump to keep open — same results on @salesforce/cli/2.19.8 darwin-arm64 node-v20.9.0
@eburtness do you still have the issue? today our pipeline ran really fast and the issue seems to be gone...
ok, on our sandboxes the issue seems to be resolved, but on our production environment the issue still exists...
As additional information, our sandboxes are on Winter '24 Patch 16.8
and our production org is on Winter '24 Patch 16.7
. Let's see when the patch will be installed on production and if it will resolve the issue.
@rohledermarc Sorry I haven't been able to test. We're currently having another problem with our sandboxes where many or all async operations (including test runs and metadata API deploys and retrieves) are taking hours or timing out entirely. It's good times haha
This issue has not received a response in 7 days. It will auto-close in 7 days unless a response is posted.
Update:
I've read every post in this issue thread and watched the video. Especially with the more recent posts about tests finishing quickly with server side changes (new patch releases, etc.) this doesn't appear to be a CLI issue. There is only 1 thing written in this thread that makes me suspect client side code, which is from j-schreiber: "If I run tests from the VS Code UI (using the "Run All Tests" overlay in a test class), results are immediately returned." If someone can confirm this and get exact versions of both VS Code and CLI that demonstrate this against the same org, then there could be a case for a bug in plugin-apex. Without that, we're missing a way to reproduce this and debug the client side code, so we are closing this issue. If any new information arises that indicates a problem with the client side code please re-open this.
If you'd like to try to debug this in your environment:
Here are some steps to debug CLI commands. The instructions are specific to the auth:web:login command but you can generalize to any command. In your shell: export NODE_OPTIONS=--inspect-brk
Open a chrome tab to: chrome://inspect (alternatively, use your preferred JS debugger and that config) run the auth:web:login command In the chrome debugger tab, click the “inspect” link. It should stop in the main entry point and give you some clues as to where the files are to put some breakpoints. Stepping through at this level will not be very productive so I would look at the file location and use it to search for a more accurate location to put breakpoints. E.g., in my case, I have an npm install of the CLI so I added debugger;
to line 147 of webOAuthServer.js (the executeOauthRequest function) at this location: ~/.nvm/versions/node/v16.15.1/lib/node_modules/sfdx-cli/node_modules/@salesforce/core/lib/webOAuthServer.js. Then when the debugger stops at the initial entry point, resume the script and it should open a browser tab to your login url and stop at your debugger; statement in webOAuthServer.js. Now you can step through.
Look for a location like: .../@salesforce/plugin-apex/lib/commands/apex/run and debug or add console.log()
or debugger;
statements.
I have a similar problem here. When I run the command:
sf apex get test --test-run-id=707Ha00000JennO --target-org myOrg
The response arrives quickly (1 to 3 seconds), but the terminal hangs for another 2 minutes.
Is it a SF CLI problem?
Just running some tests on my end. It seems like it's both a VS Code problem and a Salesforce CLI problem.
Whatever command VS Code is using to run individual tests now has new behavior. In a non-hyperforce instance, it runs all of the tests in a single test class instead of just one. In a hyperforce instance, while attempting to run a single test with no content (blank test that does nothing), it will timeout every time (10 minutes and then stop with error).
If you run this command, however: sf apex run test --tests TestClassName.testName -o org-alias-or-username -w 5
-- it behaves exactly as one would expect. It runs quickly and retrieves data quickly.
This problem could be solved in VS Code if the extension: Salesforce CLI Integration were to start using "sf apex run test" instead of whatever it's currently using.
Update: This does not behave the same every time in hyperforce. Sometimes, if you check the Apex Test Execution, you will find your test "queued." If it is queued, it will never be halted and so it won't timeout after ten minutes. It will simply start when it starts.
The same problem here: Windows, @salesforce/cli/2.38.7 win32-x64 node-v20.12.1 Instance: CHE26S
The same problem exists for sf apex:test:run and sf apex:get:test. I'm using --json output.
We have really huge amount of tests, and when working in sync mode we often get connection timeouts. But when switching to async tests, it seems
What we would like to do in order to deal with connection timeouts is to
But with the bugs mentioned before, there seems to be no way to work with Apex tests in async mode.
I've encountered this today, with a tiny number of tests in a fairly clean dev org: the tests finish in 4s all passing, but the CLI waits 2 minutes before exiting. Only occurs in some configurations (eg without named tests), but when it does it always occurs.
Cause:: This is due to an ongoing WS connection attempt in the faye pub/sub library. Faye greedily tries both https and websockets, and for me it uses the former as that comes back first, and the tests run happily. But all the while the websocket attempt is still in flight, and when tests are done and the CLI otherwise finishes, that attempt stops node from returning. It's a nice feature, preferring websockets if they are available, but I don't see an easy way in the code to cancel that in-flight attempt when the pub/sub is closing down. And I don't want to wait the 120s default timeout on my system.
Simple workaround: Tell JS to exit explicitly once tests are done. Edit ~/.local/share/sf/client/2.58.7-ed27925/bin/run.js
(or similar), adding this at the end:
process.exit(0);
I've just hit this issue with tests being run from CircleCI. The 'Test History' in my Org indicates the test run was completed in 12m09s, however, the CircleCI step that runs the command only exited after 41m28s. My CircleCI step that runs the tests includes an echo to the terminal every 5 minutes to prevent the 10m "no activity" timeout set by CircleCI. I can see that after my test run completed and output the test result to the terminal this command ran 3 more times before the sf apex run test
command provided an exit code. Here's the script if it helps (I know I know it's hacky):
echo "Running unit tests..."
(while true; do echo "Still running tests..."; sleep 300; done) & # Background process that outputs every 5 minutes
./node_modules/.bin/sf apex run test --synchronous --target-org $SFDC_USERNAME --result-format junit --output-dir test-results
EXIT_CODE=$?
kill %1 # Kill the background output process once the tests are complete
exit $EXIT_CODE # Ensure proper exit with the test exit code
This is costing me quite a lot of time and build minutes!
Summary
We are using the latest Salesforce CLI to run our test suites in our CI on our sandboxes and production environment. We are running each test suite one by one. For each run we use the CLI command
sf apex run test --json --target-org **** --suite-names Test_Suite1 --wait 10 --output-dir test-results/apex --result-format json
. Apparently the whole run is taking really long, but if we run all tests by just runningsf apex run test --wait 10
it is really quick.It seems like the tests are ready really fast and the json result is shown, but it has kind of a delay till the command ends and the script gets the exit status of the command.
All ~2000 tests through 24 test suites one by one take around 50 minutes (summarised test execution time around 8 minutes), all ~2000 tests at once take around 7 minutes (test execution time around 8 minutes).
Steps To Reproduce
Our repositories of our packages are private, so we can not share it. To reproduce the behaviour it should be enough to:
sf apex run test --wait 10
) --> compare the timesAlternative (easier):
sf apex test run
and observe how long the command takes to finish and the terminal to be ready againExpected result
The two compared times should be quite similar and not differ that much.
Actual result
The two compared times differ really much.
System Information
I am using a mac with zsh, but the issue is the same on windows and in our CI, which is using linux.
Salesforce CLI version:
Additional information
script for running all tests at once:
result of running the script:
script for running the test suites one by one:
result of running the script