Open WarningImHack3r opened 4 months ago
Hi @WarningImHack3r - just wanted to provide my thoughts on some of the (super valid) points you raised; see below
I have no clue what and where the code for your Gradle tasks
If you hunt through the Gradle plugin code, you might be able to find it:
...run each verification (download + test against this version) in parallel (if it's not already the case)...
I think doing download+test in an unbound parallel way would certainly increase the likelihood you'd run into this issue more, no? I'm thinking of it this way, if there are say 100 versions that are being tested, and the average size of each is say 2GB, when all 100 are being downloaded, the CI machine is going to try and download ~200GB of data which I suspect would fill up the CI machine (it certainly does for GHA CI machines), and would also be ~100 network requests to the same artifact server in a very brief period of time. I think it might make more sense if the number of download+test that were run in parallel were bound, say to 10 - that way theres only ~20GB of disk (in my example) being used at any given time.
...try again 1 or 2 times in case of download failure...
This (retries) would be ideal to have, IMO. FWIW, I don't personally believe it's the code in the gradle task being flaky, but more likely the artifact server (or, the network connections to) that's providing the IDEs having some network issues (or, it just being some other network issue). I'm the author of a GitHub action (ChrisCarini/intellij-platform-plugin-verifier-action) that does plugin verification (created before the gradle task existed, I believe) which also downloads whichever IDEs the user specifies, and I have run into this issue myself downloading IDEs countless times (I have https://github.com/ChrisCarini/intellij-platform-plugin-verifier-action/issues/68 open for myself, a branch I've been testing in my 11 OSS IntelliJ plugins for the past month, and I don't believe I've had my CI in any of the 11 projects fail in that time period for IDE downloading, which has been stellar 🎉 ).
If you hunt through the Gradle plugin code, you might be able to find it:
Thanks for that!!
I'm thinking of it this way, if there are say 100 versions that are being tested, and the average size of each is say 2GB, when all 100 are being downloaded, the CI machine is going to try and download ~200GB of data which I suspect would fill up the CI machine
Yes of course some sort of bound will have to be set as you describe; I'd say between 3 and 5 at any given time, depending on the right balance between CI perf, speed and reliability.
FWIW, I don't personally believe it's the code in the gradle task being flaky
I agree, a fetch code isn't something hard to have, is reliable, and the JB is more than enough capable of doing it great, without even looking at the source!
but more likely the artifact server (or, the network connections to) that's providing the IDEs having some network issues
I'm indeed thinking so too, but maybe the errors are misleading and 99/100% of the failures are instead due to disk space more than network issue despite what the error states.
TBH, fixing the disk issue should already be a very good improvement, potential additional network issues would just be a bonus at this point
I'm indeed thinking so too, but maybe the errors are misleading and 99/100% of the failures are instead due to disk space more than network issue despite what the error states.
🤷 maybeee they are misleading, maybe not. IME with the GitHub Action I authored, debugged, actively maintain, and personally use in 11 repos for ~4 years, when the runner runs out of disk space, it's pretty clear (you'd see a write: No space left on device
in the logs; check https://github.com/ChrisCarini/intellij-platform-plugin-verifier-action/issues/2 as an old historical example). Now that you know where the source is, you could take a look and see what might be logged in each condition (if it'd differ or be the same), and if you wanted, I'm sure it'd take <10min to mock up a test that intentionally runs out of disk space on the GitHub runners and see if what happens aligns with your current expectations or not.
Thanks for raising this issue.
https://github.com/WarningImHack3r/intellij-shadcn-plugin uses IntelliJ Ultimate
pluginSinceBuild = 213
pluginUntilBuild = (open ended)
so a total of currently 9-10 GA releases
https://github.com/WarningImHack3r/intellij-shadcn-plugin uses IntelliJ Ultimate [...] so a total of currently 9-10 GA releases
(same thing for https://github.com/WarningImHack3r/npm-update-dependencies)
Update: @YannCebron / @novotnyr it seems even worse with the Plugin Template 2.0, as now with the same plugins and after dropping respectively 1 whole major version and 2 minor versions, I now always get the error @ChrisCarini did you try yourself?
@WarningImHack3r I haven't had issues, but I also don't test as wide of a range as you have listed.
I have encountered a similar issue.
Prior to upgrading to plugin template 2.0, I encountered an out-of-space error on one occasion. At that time, the issue was resolveed by adding the jlumbroso/free-disk-space
plugin. This solution worked effectively by and I have not encountered any related problems for at least a year. At that time, my plugin configration was: pluginSinceBuild >= 213
.
The build has been continuoously failing. The failure occurs at the "Verify plugin" step. The error message is as follows:
System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
at GitHub.Runner.Common.Tracing.Error(Exception exception)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at System.Diagnostics.TraceSource.Flush()
at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
at GitHub.Runner.Common.TraceManager.Dispose()
at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
at GitHub.Runner.Common.HostContext.Dispose()
at GitHub.Runner.Worker.Program.Main(String[] args)
The current plugin support has been modified to: pluginSinceBuild >= 223
. However, this change has had no effect.
@WarningImHack3r , has this problem been resolved on your end?
@WarningImHack3r , has this problem been resolved on your end?
No it’s not, I’ve simply disabled this step on both my plugins unfortunately… We kinda get "punished" for supporting a wide range of versions, that’s a shame
@WarningImHack3r Thanks. We are also planning to disable the verify step...
I have encountered a similar issue.
Before upgrading to plugin template 2.0
Prior to upgrading to plugin template 2.0, I encountered an out-of-space error on one occasion. At that time, the issue was resolveed by adding the
jlumbroso/free-disk-space
plugin. This solution worked effectively by and I have not encountered any related problems for at least a year. At that time, my plugin configration was:pluginSinceBuild >= 213
.Afer upgrading to plugin template 2.0
The build has been continuoously failing. The failure occurs at the "Verify plugin" step. The error message is as follows:
System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log' at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset) at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite() at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder) at System.Diagnostics.TextWriterTraceListener.Flush() at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id) at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message) at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message) at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut) at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args) System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log' at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset) at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite() at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder) at System.Diagnostics.TextWriterTraceListener.Flush() at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id) at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message) at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message) at GitHub.Runner.Common.Tracing.Error(Exception exception) at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args) Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log' at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset) at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite() at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder) at System.Diagnostics.TextWriterTraceListener.Flush() at System.Diagnostics.TraceSource.Flush() at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing) at GitHub.Runner.Common.TraceManager.Dispose() at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing) at GitHub.Runner.Common.HostContext.Dispose() at GitHub.Runner.Worker.Program.Main(String[] args)
The current plugin support has been modified to:
pluginSinceBuild >= 223
. However, this change has had no effect.@WarningImHack3r , has this problem been resolved on your end?
I ran recently into a similar issue on my plugin dbt-toolkit it's CI pipelines. However in my case it already throws an IOException during the build phase, anything I can do to resolve this issue? I guess custom GitHub runners with more provisioned disk space, but ideally I have another solution that works.
Because I am not certain (since it is in another step) if the cause is related, I opened a separate issue https://github.com/JetBrains/intellij-platform-plugin-template/issues/488.
What happened?
The
Run Plugin Verification tasks
task (./gradlew runPluginVerifier
) in theverify
job periodically fails to download (some?) IDE versions.From what I can tell, it's due to a lack of disk space, despite the step to maximize space.
I support a wide range of IDE versions in my plugins, which may be why I'm more likely to fall into this kind of problem. The thing I fail to explain is why it's so periodical (sometimes no problem for months, then problems again for a week or 2 consistently): probably because the different EAP IDEs may be taking a lot more than usual.
The fear I have is the more IDE versions you guys release, the more I'll likely encounter this issue, meaning it won't get any better soon in my opinion. There will likely be a point when I would never again be able to get my CI to work past some amount of IDE versions.
I have no clue what and where the code for your Gradle tasks is, but my suggestion is the following: run each verification (download + test against this version) in parallel (if it's not already the case), then after each test immediately delete that version. Additionally, try again 1 or 2 times in case of download failure (if it's not caused by disk space) to avoid failing jobs as much as possible. It seems to be the most scalable solution to me.
Thank you!
Relevant log output or stack trace
Steps to reproduce
Run the CI in https://github.com/WarningImHack3r/intellij-shadcn-plugin (preferred) or https://github.com/WarningImHack3r/npm-update-dependencies
Gradle IntelliJ Plugin version
1.17.4
Gradle version
8.8
Operating System
None
Link to build, i.e. failing GitHub Action job
https://github.com/WarningImHack3r/intellij-shadcn-plugin/actions/runs/9605368413/job/26492877373