JetBrains / intellij-platform-gradle-plugin

Gradle plugin for building plugins for IntelliJ-based IDEs
https://plugins.jetbrains.com/docs/intellij/gradle-prerequisites.html
Apache License 2.0
1.41k stars 270 forks source link

Multiple copies of IDE distribution in Gradle cache + reindexing problem #1601

Open chashnikov opened 3 months ago

chashnikov commented 3 months ago

What happened?

I migrated a project to IntelliJ Platform Gradle Plugin 2.0 and specified intellijIdeaUltimate("2024.1") in build.gradle.kts and didn't change it. However, after a few hours of work, I found that the Gradle cache contains six copies of IDEA Ultimate distribution:

nik@nik-workstation:~/.gradle/caches/transforms-3$ find -name ideaIU-2024.1 ./e0d45b10a0ea56c67b8eef1a3248a586/transformed/ideaIU-2024.1 ./d89f9fc4990ceec8ab9fc252d20afe96/transformed/ideaIU-2024.1 ./80050e3b4d7632fe5e49c5042913887a/transformed/ideaIU-2024.1 ./7167bc64f2f585d0ea26534a74c92e9a/transformed/ideaIU-2024.1 ./b91bb87b6350df68a77c91dc42a6215e/transformed/ideaIU-2024.1 ./172cec470fbf4c2fa3c0a0f57e6b823c/transformed/ideaIU-2024.1

Since each distribution occupies 3.3Gb, it's quite a lot of space.

Steps to reproduce

Not when exactly new copies are created.

Gradle IntelliJ Plugin version

2.0.0-SNAPSHOT

Gradle version

8.5

Operating System

Linux

hsz commented 3 months ago

I'm aware of this problem and investigating the root cause. Expected behavior is that Gradle reuses already transformed artifact even if build configuration changes.

JojOatXGME commented 1 month ago

I encountered the same problem. I ended up with 197 GiB worth of JetBrains IDEs in ~/.gradle/caches/transforms-4 after only a some days of experimenting with 2.0.

PS: #1639 look like a duplicate of this issue.

I am using convention plugins, maybe that is part of the problem? I could imagine that Gradle uses the classpath of the plugin which defines the transformer as part of hash. After all, the transformer might have been changed between two versions of the plugin. With convention plugins, the transformer becomes effectively part of the convention plugin, which would mean Gradle would have to re-run the transformer every time I change my build logic in the convention plugin. But that is just a guess.

Undin commented 1 month ago

@hsz any updates on this (I've read https://github.com/JetBrains/intellij-platform-gradle-plugin/issues/1639#issuecomment-2149097095)?

hsz commented 1 week ago

The Kotlin Script I've created for purging IntelliJ Platform extracted archives from the Gradle Transformers Cache:

https://gist.github.com/hsz/0fc45e1a6fc9ef73d4e4f5960058bded

hsz commented 6 days ago

Hello, Jonathan! I am very well aware of this problem. Let me explain to you the state of the current implementation:

Adding the IntelliJ Platform dependency to the project resolves it from the IJ Maven Repository or CDN (download.jetbrains.com). Both sources provide the IntelliJ Platform archive — Maven gives ZIP and CDN, depending on your OS: DMG, ZIP, or TAR.GZ. Gradle fetches such an archive into the cache directory, like ~/.gradle/caches/modules-2/files-2.1/....

The Gradle IntelliJ Plugin 1.x is extracting content next to the archive, polluting the cache, but it works. With IntelliJ Platform Gradle Plugin 2.0, I decided to do it correctly and involve the artifact transformers mechanism provided by Gradle. The extracted content goes into a dedicated ~/.gradle/caches/transforms-4/[HASH] location, and the dependency can be correctly formed using native Gradle features.

After the implementation, it turned out, that Gradle calculates [HASH] using the project build classpath. This means that this hash changes whenever you update the IntelliJ Platform Gradle Plugin plugin or any other Gradle plugin in your project (or have a buildSrc local setup that changes). As a side effect, the IntelliJ Platform archive gets extracted again, the cache grows, and the IDE reindexes the IntelliJ Platform dependency again.

I already had a chat with Gradle folks about this issue, but there's no solution possible to keep relying on the artifact transformers feature. I'm currently trying to figure out another solution, and most likely, we'll introduce a custom cache location, such as ~/.intellijPlatform/ides/, to keep just a single copy of the extracted IDE. Keep your fingers crossed!

https://github.com/JetBrains/intellij-platform-gradle-plugin/issues/1639#issuecomment-2149097095

hsz commented 6 days ago

Unfortunately, it is impossible to extract the resolved IntelliJ Platform artifact (no matter if this is installer or a ZIP archive from the IntelliJ Maven repository) to a custom directory and reuse it later with Gradle dependencies mechanism without bReaking some of Gradle foundations.

I consider implementing the fix in the Gradle build system as the only possible solution.

Vanco commented 5 days ago

Maybe use the HASH of file (zip, dmg, tag.gz) instead of Gradle calculates one. @hsz

hsz commented 5 days ago

I have no impact on what is taken as an input:

https://github.com/gradle/gradle/blob/7457e89eed50aa0b9dbab3a521141f6b8ce4a073/platforms/software/dependency-management/src/main/java/org/gradle/api/internal/artifacts/transform/DefaultTransform.java#L683

Gradle considers the whole implementation classpath of the build script. The idea is to limit the input classpath and isolate it by specifying dependencies that affect the transformer output, such as:

dependencies {
    registerTransform(MyTransform::class.java) {
        from...
        to...
        isolated {
             classpath("transform:dependency-1:1.0")
             classpath("transform:dependency-2:1.0")
        }
    }
}

Having that implemented, I could pass the IntelliJ Platform dependency input, and Gradle will calculate the hash of the archive you mentioned. But that requires changes in the build system.

JojOatXGME commented 4 days ago

The idea is to limit the input classpath and isolate it by specifying dependencies that affect the transformer output

Not sure if this is the right place to discuss that, but I have a few ideas about that.

  1. When the transformer is part of a JPMS module, the module information could be used to discover all relevant modules. This way, Gradle could avoid using too broad inputs for the hash without adding any additional configuration. However, as soon as only one dependency of the transformer does not have a module-info, this stops working. So, it is probably not the best solution right now. Not even sure if Gradle's API itself has a module-info yet.

  2. Technically, a similar solution would be to recursively scan and hash all the classes used by the transformer. Not sure how well that performs. Extracting the dependencies in the form of classes from a class file should be fast since they are all listed in the constant pool. You don't have to scan the whole class file. Gradle also already reads a lot of class files for different purposes, so it is not a completely new concept. However, scanning all the classes recursively may still take more time than desired. The implementation would also not be 100% reliable when the transformer (indirectly) uses reflection or service discovery to resolve classes. On the other side, the current implementation of calculating the hash from the class path effectively means that Gradle already scans over all the class files (in compressed form) once. So, calculating the hash for one transform has the same (or maybe even better) scaling properties, but with a negative constant factor. The hash also cannot be reused for multiple transforms in one module.

  3. Alternatively, in contrast to your suggestion above, I am wondering if it would make sense to define transformers as modules.

    dependencies {
        registerTransform("transform:transform:1.0") {
            from...
            to...
        }
    }

    The module would then have to provide the transformer class using service discovery or some config file in the JAR. This way, you avoid the risk of having a mismatch between the transformer modules and the modules declared in the isolated block.

hsz commented 4 days ago

Thank you for your input, @JojOatXGME!

  1. Unfortunately, IntelliJ Platform dependencies have no module-info at all. I could try creating something on-the-fly, but...
  2. I'm not sure if we're on the same page here. Gradle creates a transformer output directory using the hash of the build script classpath — the transformed artifact itself is completely ignored at this point. And by the build script classpath I mean i.e. all plugins you have applied in the build script (java, org.jetbrains.kotlin.jvm, org.jetbrains.intellij.platform, etc.). If you update build script dependencies (2.0.0-rc1 -> 2.0.0-rc2), a hash of the classpath changes, and this leads to running the dependency transformer again. In my post above, I incorrectly mentioned that the IntelliJ Platform dependency archive is used for hashing.
  3. This sounds intriguing — but how could we create such a module?
JojOatXGME commented 4 days ago

2. I'm not sure if we're on the same page here. Gradle creates a transformer output directory using the hash of the build script classpath

Yes, I understood that. My thought was that technically, Gradle could hash the classes recursively referred by the transform, instead of hashing the whole classpath. If someone uses registerTransform(MyTransform::class.java), Gradle could only hash the classes used by MyTransform instead of hashing the whole class path. Anyway, there are a few caveats as mentioned in my previous comment, so I am not yet convinced by this solution myself. However, it might be worth to further investigating the feasibility.

3. This sounds intriguing — but how could we create such a module?

That is left to Gradle to define. (My ideas were about how Gradle could refine its transform feature, similar to how you mentioned the introduction of the isolated block as an example.) One solution may be that such a module must contain exactly one service provider of type TransformAction. Gradle would then load the module and use ServiceLoader to load the transform. Alternatively, Gradle could define that modules used for registerTransform must put the name of the transform class into META-INF/MANIFEST.MF. My biggest concern about this solution is that there is the potential for a recursive cycle. Transforms are used while resolving modules, but now Transforms are modules itself. However, it is probably better to resolve the transforms from the plugin repositories anyway and ignore the local transforms while doing so, in which case there would be no cycle.