JetBrains / intellij-platform-plugin-template

Template repository for creating plugins for IntelliJ Platform
Apache License 2.0
3.15k stars 640 forks source link

CI: Plugin Verification task lacking space for plugins with wide version range #462

Open WarningImHack3r opened 4 months ago

WarningImHack3r commented 4 months ago

What happened?

The Run Plugin Verification tasks task (./gradlew runPluginVerifier) in the verify job periodically fails to download (some?) IDE versions.

From what I can tell, it's due to a lack of disk space, despite the step to maximize space.

I support a wide range of IDE versions in my plugins, which may be why I'm more likely to fall into this kind of problem. The thing I fail to explain is why it's so periodical (sometimes no problem for months, then problems again for a week or 2 consistently): probably because the different EAP IDEs may be taking a lot more than usual.

The fear I have is the more IDE versions you guys release, the more I'll likely encounter this issue, meaning it won't get any better soon in my opinion. There will likely be a point when I would never again be able to get my CI to work past some amount of IDE versions.

I have no clue what and where the code for your Gradle tasks is, but my suggestion is the following: run each verification (download + test against this version) in parallel (if it's not already the case), then after each test immediately delete that version. Additionally, try again 1 or 2 times in case of download failure (if it's not caused by disk space) to avoid failing jobs as much as possible. It seems to be the most scalable solution to me.

Thank you!

Relevant log output or stack trace

Run ./gradlew runPluginVerifier -Dplugin.verifier.home.dir=~/.pluginVerifier

Starting a Gradle Daemon (subsequent builds will be faster)
Calculating task graph as no cached configuration is available for tasks: runPluginVerifier
> Task :checkKotlinGradlePluginConfigurationErrors
> Task :initializeIntelliJPlugin
> Task :downloadAndroidStudioProductReleasesXml
> Task :downloadIdeaProductReleasesXml
> Task :patchPluginXml
> Task :verifyPluginConfiguration
> Task :processResources
> Task :listProductsReleases
> Task :compileKotlin
> Task :compileJava NO-SOURCE
> Task :classes
> Task :instrumentCode
> Task :jar
> Task :instrumentedJar
> Task :prepareSandbox
> Task :verifyPlugin
2024-06-22 17:27:01,078 [    295]   WARN - llij.ide.plugins.PluginManager - id redefinition ([row,col,system-id]: [2,3,"product classpath"]) 
> Task :buildSearchableOptions
> Task :compileTestKotlin
> Task :classpathIndexCleanup SKIPPED
> Task :buildSearchableOptions
Jun 22, 2024 5:27:04 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
Jun 22, 2024 5:27:04 PM java.util.prefs.FileSystemPreferences$6 run
WARNING: Prefs file removed in background /home/runner/.java/.userPrefs/prefs.xml
Starting searchable options index builder
2024-06-22 17:27:05,096 [   4313]   WARN - l.NotificationGroupManagerImpl - Notification group CodeWithMe is already registered (group=com.intellij.notification.NotificationGroup@2dc67902). Plugin descriptor: PluginDescriptor(name=Code With Me, id=com.jetbrains.codeWithMe, descriptorPath=plugin.xml, path=~/.gradle/caches/modules-2/files-2.1/com.jetbrains.intellij.idea/ideaIU/2021.3/3597c84ef4f0d240f4f9d9eacba7509e01674c4d/ideaIU-2021.3/plugins/cwm-plugin, version=213.5744.223, package=null, isBundled=true) 
2024-06-22 17:27:22,987 [  22204]   WARN - ellchecker.SpellCheckerManager - Couldn't load dictionary 'svelte.dic' with loader 'class dev.blachut.svelte.lang.spellchecker.SvelteSpellcheckingDictionaryProvider' 
2024-06-22 17:27:24,296 [  23513]   WARN - #com.intellij.ui.jcef.JBCefApp - JCEF is manually disabled in headless env via 'ide.browser.jcef.headless.enabled=false' 
Found 355 configurables
Searchable options index builder completed
> Task :jarSearchableOptions
> Task :buildPlugin
> Task :runPluginVerifier
[gradle-intellij-plugin :runPluginVerifier] Cannot download 'IU-2022.1.4' from 'release' channel: https://cache-redirector.jetbrains.com/download.jetbrains.com/idea/ideaIU-2022.1.4.tar.gz. Run with --debug option to get more log output.

Steps to reproduce

Run the CI in https://github.com/WarningImHack3r/intellij-shadcn-plugin (preferred) or https://github.com/WarningImHack3r/npm-update-dependencies

Gradle IntelliJ Plugin version

1.17.4

Gradle version

8.8

Operating System

None

Link to build, i.e. failing GitHub Action job

https://github.com/WarningImHack3r/intellij-shadcn-plugin/actions/runs/9605368413/job/26492877373

ChrisCarini commented 4 months ago

Hi @WarningImHack3r - just wanted to provide my thoughts on some of the (super valid) points you raised; see below

I have no clue what and where the code for your Gradle tasks

If you hunt through the Gradle plugin code, you might be able to find it:

...run each verification (download + test against this version) in parallel (if it's not already the case)...

I think doing download+test in an unbound parallel way would certainly increase the likelihood you'd run into this issue more, no? I'm thinking of it this way, if there are say 100 versions that are being tested, and the average size of each is say 2GB, when all 100 are being downloaded, the CI machine is going to try and download ~200GB of data which I suspect would fill up the CI machine (it certainly does for GHA CI machines), and would also be ~100 network requests to the same artifact server in a very brief period of time. I think it might make more sense if the number of download+test that were run in parallel were bound, say to 10 - that way theres only ~20GB of disk (in my example) being used at any given time.

...try again 1 or 2 times in case of download failure...

This (retries) would be ideal to have, IMO. FWIW, I don't personally believe it's the code in the gradle task being flaky, but more likely the artifact server (or, the network connections to) that's providing the IDEs having some network issues (or, it just being some other network issue). I'm the author of a GitHub action (ChrisCarini/intellij-platform-plugin-verifier-action) that does plugin verification (created before the gradle task existed, I believe) which also downloads whichever IDEs the user specifies, and I have run into this issue myself downloading IDEs countless times (I have https://github.com/ChrisCarini/intellij-platform-plugin-verifier-action/issues/68 open for myself, a branch I've been testing in my 11 OSS IntelliJ plugins for the past month, and I don't believe I've had my CI in any of the 11 projects fail in that time period for IDE downloading, which has been stellar 🎉 ).

WarningImHack3r commented 4 months ago

If you hunt through the Gradle plugin code, you might be able to find it:

Thanks for that!!

I'm thinking of it this way, if there are say 100 versions that are being tested, and the average size of each is say 2GB, when all 100 are being downloaded, the CI machine is going to try and download ~200GB of data which I suspect would fill up the CI machine

Yes of course some sort of bound will have to be set as you describe; I'd say between 3 and 5 at any given time, depending on the right balance between CI perf, speed and reliability.

FWIW, I don't personally believe it's the code in the gradle task being flaky

I agree, a fetch code isn't something hard to have, is reliable, and the JB is more than enough capable of doing it great, without even looking at the source!

but more likely the artifact server (or, the network connections to) that's providing the IDEs having some network issues

I'm indeed thinking so too, but maybe the errors are misleading and 99/100% of the failures are instead due to disk space more than network issue despite what the error states.
TBH, fixing the disk issue should already be a very good improvement, potential additional network issues would just be a bonus at this point

ChrisCarini commented 4 months ago

I'm indeed thinking so too, but maybe the errors are misleading and 99/100% of the failures are instead due to disk space more than network issue despite what the error states.

🤷 maybeee they are misleading, maybe not. IME with the GitHub Action I authored, debugged, actively maintain, and personally use in 11 repos for ~4 years, when the runner runs out of disk space, it's pretty clear (you'd see a write: No space left on device in the logs; check https://github.com/ChrisCarini/intellij-platform-plugin-verifier-action/issues/2 as an old historical example). Now that you know where the source is, you could take a look and see what might be logged in each condition (if it'd differ or be the same), and if you wanted, I'm sure it'd take <10min to mock up a test that intentionally runs out of disk space on the GitHub runners and see if what happens aligns with your current expectations or not.

YannCebron commented 4 months ago

Thanks for raising this issue.

https://github.com/WarningImHack3r/intellij-shadcn-plugin uses IntelliJ Ultimate

pluginSinceBuild = 213
pluginUntilBuild = (open ended)

so a total of currently 9-10 GA releases

WarningImHack3r commented 4 months ago

https://github.com/WarningImHack3r/intellij-shadcn-plugin uses IntelliJ Ultimate [...] so a total of currently 9-10 GA releases

(same thing for https://github.com/WarningImHack3r/npm-update-dependencies)

WarningImHack3r commented 3 months ago

Update: @YannCebron / @novotnyr it seems even worse with the Plugin Template 2.0, as now with the same plugins and after dropping respectively 1 whole major version and 2 minor versions, I now always get the error @ChrisCarini did you try yourself?

ChrisCarini commented 2 months ago

@WarningImHack3r I haven't had issues, but I also don't test as wide of a range as you have listed.

rosuH commented 2 weeks ago

I have encountered a similar issue.

Before upgrading to plugin template 2.0

Prior to upgrading to plugin template 2.0, I encountered an out-of-space error on one occasion. At that time, the issue was resolveed by adding the jlumbroso/free-disk-space plugin. This solution worked effectively by and I have not encountered any related problems for at least a year. At that time, my plugin configration was: pluginSinceBuild >= 213.

Afer upgrading to plugin template 2.0

The build has been continuoously failing. The failure occurs at the "Verify plugin" step. The error message is as follows:

System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

The current plugin support has been modified to: pluginSinceBuild >= 223. However, this change has had no effect.

@WarningImHack3r , has this problem been resolved on your end?

WarningImHack3r commented 2 weeks ago

@WarningImHack3r , has this problem been resolved on your end?

No it’s not, I’ve simply disabled this step on both my plugins unfortunately… We kinda get "punished" for supporting a wide range of versions, that’s a shame

rosuH commented 2 weeks ago

@WarningImHack3r Thanks. We are also planning to disable the verify step...

ramonvermeulen commented 21 hours ago

I have encountered a similar issue.

Before upgrading to plugin template 2.0

Prior to upgrading to plugin template 2.0, I encountered an out-of-space error on one occasion. At that time, the issue was resolveed by adding the jlumbroso/free-disk-space plugin. This solution worked effectively by and I have not encountered any related problems for at least a year. At that time, my plugin configration was: pluginSinceBuild >= 213.

Afer upgrading to plugin template 2.0

The build has been continuoously failing. The failure occurs at the "Verify plugin" step. The error message is as follows:

System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

The current plugin support has been modified to: pluginSinceBuild >= 223. However, this change has had no effect.

@WarningImHack3r , has this problem been resolved on your end?

I ran recently into a similar issue on my plugin dbt-toolkit it's CI pipelines. However in my case it already throws an IOException during the build phase, anything I can do to resolve this issue? I guess custom GitHub runners with more provisioned disk space, but ideally I have another solution that works.

Because I am not certain (since it is in another step) if the cause is related, I opened a separate issue https://github.com/JetBrains/intellij-platform-plugin-template/issues/488.