Closed RemiHoston closed 1 month ago
It sounds like you have diagnosed the problem as a dead or stuck runner.
Do you have a way to reproduce this? Any logs to help debug what might have happened? What environments(s)/OSes are you running from?
Thank you for your attention, in my issues the only useful information is like:
2024-10-17T08:56:51.9589391Z MSBuild version 17.9.8+610b4d3b5 for .NET 2024-10-17T08:57:22.5882646Z Error ---------------------------------- 2024-10-17T08:57:22.5882913Z 2024-10-17T08:57:22.5883196Z [Gauge] 2024-10-17T08:57:22.5883618Z Failed to start gauge API: Timed out connecting to dotnet 2024-10-17T08:57:22.5886914Z 2024-10-17T08:57:24.1956320Z Get Support ---------------------------- 2024-10-17T08:57:24.1962697Z 2024-10-17T08:57:24.2209038Z Docs: https://docs.gauge.org 2024-10-17T08:57:24.2214110Z Bugs: https://github.com/getgauge/gauge/issues 2024-10-17T08:57:24.2218619Z Chat: https://github.com/getgauge/gauge/discussions 2024-10-17T08:57:24.2219847Z 2024-10-17T08:57:24.2221192Z Your Environment Information ----------- 2024-10-17T08:57:24.2222946Z windows, 1.6.9, aff43ef 2024-10-17T08:57:24.2224027Z csharp (0.10.6), dotnet (0.7.2), html-report (4.3.1), screenshot (0.3.0), xml-report (0.5.1)
It happens when a lot of cases are finished running, and a new task to start a new round gauge running immediately. It can be solved by retry several times on the pipeline.
Agent info: |Agent.OS | Windows_NT | | Agent.OSArchitecture | X64 | | Agent.OSVersion | 10.0.17763| Hope the Gauge tool can be better and better!
I met the same issue with RemiHoston, Do you have any idea? @chadlwilson
No, unfortunately not - as I don't have any information to replicate it, proper logs from the Gauge itself (logs/gauge.log
or similar), or any details from people about what changed when this started happening.
It seems similar to https://github.com/getgauge/gauge-dotnet/issues/196 (which supposedly was fixed in https://github.com/getgauge/gauge-dotnet/issues/197) but may have resurfaced as noted in https://github.com/getgauge/gauge-dotnet/issues/204 and https://github.com/getgauge/gauge-dotnet/issues/199
You could try rolling back to gauge-dotnet 0.5.8
. if the problem goes away or you're using async methods, it's likely related to those I guess?
Yea, this is depend on the latest changes about Gauge.CSharp.Lib
. Currently, we have referenced the latest version 0.11.3
, the plugin for dotnet min version is 0.7.1. It contains a break change.
@RemiHoston Sorry, I don't understand what you are saying. You can roll back the CSharp and/or Gauge version too, right?
Sorry, my bad. I mean the component have referenced in my project is Gauge.CSharp.Lib 0.11.3
, which contains break changes in different usage of SuiteDataStore, SituationDataStore and SpecDataStore(These three have become a static class, but before it was controlled by DataStoreFactory).
And the version of Gauge.CSharp.Lib 0.11.3
requires the dotnet plugin at least 0.7.1
(if I am correct), so the dotnet 0.5.8 could not run successfully.
By the way, what's special that the latest version of dotnet 0.7.2? Could we optimize it?
And if the plugin dotnet is 0.7.2, it can run smoothly:
Since upgrade Gauge.CSharp.Lib
from 0.10.3 to 0.11.3 we have done a lot of changes in our gauge test project, and we hope we can use the latest Gauge version.
Let me try asking another way.
What was the last reliable combination of gauge/gauge-dotnet/Gauge.CSharp.Lib
versions in your environment, that did not have this problem?
We need to narrow down when a problem started or we will be going around in circles forever guessing. Try to make it easy for maintainers, rather than require them to guess what you have done or changed in your environment or how you might use a piece of software.
Thank you, have checked our history calls, however this version had been used for a very long time.
Gauge.CSharp.Lib.0.7.2
Gauge version: 1.0.8 Commit Hash: 28617ea
csharp (0.10.6) dotnet (0.1.7) html-report (4.0.12) screenshot (0.0.1) xml-report (0.2.3)
Unfortunately it can happen for different reasons depending on what type of specs impls you are writing. So what caused the problem earlier may not be what is causing the problem now. If there was a 'good' version somewhere in between that'd be useful to know.
Do you have logs from gauge itself (not from your spec run) you can share?
If I understand you correctly this happens occasionally? I can just mention that I have seen this problem from time to time for a long time (years). On my own development machine not very often, but when working on a client's laptop I get it a couple of times every day I believe (should maybe start to track this more).
I believed it to be a performance issue. When running larger projects on machines that are a little slow (for lack of a better description) Gauge tends be more unstable. I've also seen the problem i VSC that run/debug options does not appear - the project does not load correctly. Changing to a more powerful machine solved that issue.
I have had the timeout problem in pipelines too (Azure Linux agents), I "solved" it there by increasing the timeouts to some insane amount, since it only happened every now and then and a bigger timeout "solved" it.
gauge config --list (to see timeout options).
Unfortunately I'm too busy with customer project (I'm a consultant) to investigate, so I can only give vague statements, but in my opinion there is some flakeyness going on that should be addressed.
I have also seen this issue for a few years, but exclusively running through VSCode. I had always assumed that it may have been a defect in the VSCode plugin. The fix (for me) was to shut down VSCode and then kill all running processes of .NET Host. Reading through this thread, maybe it's a problem where the dotnet plugin isn't getting shut down correctly leaving running processes of .NET Host hanging around until the machine's limitations (memory/cpu) can no longer launch the next .NET Host in a timely manor. It's also possible the async changes have somehow made the existing issue worse, such as consume more resources per run or make it more likely it fails the shut down correctly.
Like Chad said, this can happen for a variety of reasons, but ultimately the error occurs when gauge is not able to connect to the dotnet plugin. The most likely cause is that the dotnet runner crashed for some reason.
Some options to get more details:
--log-level=debug
, ex- gauge run specs -log-level debug
-> this shows more information than normal and can shed some light on what's happening.logs
directory post the error. %APPDATA%/gauge/plugins/dotnet/<version>
IIRC)
b. now cd into your project directory and invoke dotnet plugin standaloneex:
cd c:\path\to\project\root
%APPDATA%\gauge\plugins\dotnet\<version>\bin\dotnet.bat --start
This is the command that gauge invokes, so by manually invoking this you can see the stdout of gauge-dotnet when running your project. My suspicion is that something is going wrong here, and we will know more once we simulate the error.
As for the orphan processes, unfortunately in windows the child process does not die when gauge is killed forcibly. I suspect that is causing this behaviour but it's hard to comment without having a way to replicate the issue.
Thank you guys! Have run like this
gauge run specs -l debug But nothing useful can be output. On the other hand, I found that it is more often in the afternoon than in the morning and the total time consuming for the Gauge-Test-Task is about 48 sec. It seems to be related with the agent resources(CPU usage or disk). So I try to update the
runner_connection_timeout
(gauge.properties) from 30 sec to 180 sec. Will keep monitor for a long time. Does dotnet8 makes it slower?
And keep monitor for these two days, the most time consuming for the gauge test runner to connect the dotnet api is 46s. And so far no more time out issue has happened. I think my issue has been fixed.
Is your feature request related to a problem? Please describe.
When run the command gauge run specs the error happened
Failed to start gauge API: Time out connecting to dotnet
Describe the solution you'd like Have found this issue occurred very frequently when running on local machine and pipeline agent. When it happened locally I can close the process from the Task Manager, but on pipeline have to retry after a few minutes. So I realized it may be a feature defect in the gauge main logic.
My gauge version is:
Describe alternatives you've considered It's better to close all the dotnet process when the gauge run command finished.