Open chashnikov opened 3 months ago
I'm aware of this problem and investigating the root cause. Expected behavior is that Gradle reuses already transformed artifact even if build configuration changes.
I encountered the same problem. I ended up with 197 GiB worth of JetBrains IDEs in ~/.gradle/caches/transforms-4
after only a some days of experimenting with 2.0.
PS: #1639 look like a duplicate of this issue.
I am using convention plugins, maybe that is part of the problem? I could imagine that Gradle uses the classpath of the plugin which defines the transformer as part of hash. After all, the transformer might have been changed between two versions of the plugin. With convention plugins, the transformer becomes effectively part of the convention plugin, which would mean Gradle would have to re-run the transformer every time I change my build logic in the convention plugin. But that is just a guess.
@hsz any updates on this (I've read https://github.com/JetBrains/intellij-platform-gradle-plugin/issues/1639#issuecomment-2149097095)?
The Kotlin Script I've created for purging IntelliJ Platform extracted archives from the Gradle Transformers Cache:
https://gist.github.com/hsz/0fc45e1a6fc9ef73d4e4f5960058bded
Hello, Jonathan! I am very well aware of this problem. Let me explain to you the state of the current implementation:
Adding the IntelliJ Platform dependency to the project resolves it from the IJ Maven Repository or CDN (download.jetbrains.com). Both sources provide the IntelliJ Platform archive — Maven gives ZIP and CDN, depending on your OS: DMG, ZIP, or TAR.GZ. Gradle fetches such an archive into the cache directory, like ~/.gradle/caches/modules-2/files-2.1/....
The Gradle IntelliJ Plugin 1.x is extracting content next to the archive, polluting the cache, but it works. With IntelliJ Platform Gradle Plugin 2.0, I decided to do it correctly and involve the artifact transformers mechanism provided by Gradle. The extracted content goes into a dedicated ~/.gradle/caches/transforms-4/[HASH] location, and the dependency can be correctly formed using native Gradle features.
After the implementation, it turned out, that Gradle calculates [HASH] using the project build classpath. This means that this hash changes whenever you update the IntelliJ Platform Gradle Plugin plugin or any other Gradle plugin in your project (or have a buildSrc local setup that changes). As a side effect, the IntelliJ Platform archive gets extracted again, the cache grows, and the IDE reindexes the IntelliJ Platform dependency again.
I already had a chat with Gradle folks about this issue, but there's no solution possible to keep relying on the artifact transformers feature. I'm currently trying to figure out another solution, and most likely, we'll introduce a custom cache location, such as ~/.intellijPlatform/ides/, to keep just a single copy of the extracted IDE. Keep your fingers crossed!
https://github.com/JetBrains/intellij-platform-gradle-plugin/issues/1639#issuecomment-2149097095
Unfortunately, it is impossible to extract the resolved IntelliJ Platform artifact (no matter if this is installer or a ZIP archive from the IntelliJ Maven repository) to a custom directory and reuse it later with Gradle dependencies mechanism without bReaking some of Gradle foundations.
I consider implementing the fix in the Gradle build system as the only possible solution.
Maybe use the HASH of file (zip, dmg, tag.gz) instead of Gradle calculates one. @hsz
I have no impact on what is taken as an input:
Gradle considers the whole implementation classpath of the build script. The idea is to limit the input classpath and isolate it by specifying dependencies that affect the transformer output, such as:
dependencies {
registerTransform(MyTransform::class.java) {
from...
to...
isolated {
classpath("transform:dependency-1:1.0")
classpath("transform:dependency-2:1.0")
}
}
}
Having that implemented, I could pass the IntelliJ Platform dependency input, and Gradle will calculate the hash of the archive you mentioned. But that requires changes in the build system.
The idea is to limit the input classpath and isolate it by specifying dependencies that affect the transformer output
Not sure if this is the right place to discuss that, but I have a few ideas about that.
When the transformer is part of a JPMS module, the module information could be used to discover all relevant modules. This way, Gradle could avoid using too broad inputs for the hash without adding any additional configuration. However, as soon as only one dependency of the transformer does not have a module-info, this stops working. So, it is probably not the best solution right now. Not even sure if Gradle's API itself has a module-info yet.
Technically, a similar solution would be to recursively scan and hash all the classes used by the transformer. Not sure how well that performs. Extracting the dependencies in the form of classes from a class file should be fast since they are all listed in the constant pool. You don't have to scan the whole class file. Gradle also already reads a lot of class files for different purposes, so it is not a completely new concept. However, scanning all the classes recursively may still take more time than desired. The implementation would also not be 100% reliable when the transformer (indirectly) uses reflection or service discovery to resolve classes. On the other side, the current implementation of calculating the hash from the class path effectively means that Gradle already scans over all the class files (in compressed form) once. So, calculating the hash for one transform has the same (or maybe even better) scaling properties, but with a negative constant factor. The hash also cannot be reused for multiple transforms in one module.
Alternatively, in contrast to your suggestion above, I am wondering if it would make sense to define transformers as modules.
dependencies {
registerTransform("transform:transform:1.0") {
from...
to...
}
}
The module would then have to provide the transformer class using service discovery or some config file in the JAR. This way, you avoid the risk of having a mismatch between the transformer modules and the modules declared in the isolated
block.
Thank you for your input, @JojOatXGME!
module-info
at all. I could try creating something on-the-fly, but...java
, org.jetbrains.kotlin.jvm
, org.jetbrains.intellij.platform
, etc.). If you update build script dependencies (2.0.0-rc1
-> 2.0.0-rc2
), a hash of the classpath changes, and this leads to running the dependency transformer again. In my post above, I incorrectly mentioned that the IntelliJ Platform dependency archive is used for hashing.2. I'm not sure if we're on the same page here. Gradle creates a transformer output directory using the hash of the build script classpath
Yes, I understood that. My thought was that technically, Gradle could hash the classes recursively referred by the transform, instead of hashing the whole classpath. If someone uses registerTransform(MyTransform::class.java)
, Gradle could only hash the classes used by MyTransform
instead of hashing the whole class path. Anyway, there are a few caveats as mentioned in my previous comment, so I am not yet convinced by this solution myself. However, it might be worth to further investigating the feasibility.
3. This sounds intriguing — but how could we create such a module?
That is left to Gradle to define. (My ideas were about how Gradle could refine its transform feature, similar to how you mentioned the introduction of the isolated
block as an example.) One solution may be that such a module must contain exactly one service provider of type TransformAction
. Gradle would then load the module and use ServiceLoader
to load the transform. Alternatively, Gradle could define that modules used for registerTransform
must put the name of the transform class into META-INF/MANIFEST.MF
. My biggest concern about this solution is that there is the potential for a recursive cycle. Transforms are used while resolving modules, but now Transforms are modules itself. However, it is probably better to resolve the transforms from the plugin repositories anyway and ignore the local transforms while doing so, in which case there would be no cycle.
What happened?
I migrated a project to IntelliJ Platform Gradle Plugin 2.0 and specified
intellijIdeaUltimate("2024.1")
in build.gradle.kts and didn't change it. However, after a few hours of work, I found that the Gradle cache contains six copies of IDEA Ultimate distribution:Since each distribution occupies 3.3Gb, it's quite a lot of space.
Steps to reproduce
Not when exactly new copies are created.
Gradle IntelliJ Plugin version
2.0.0-SNAPSHOT
Gradle version
8.5
Operating System
Linux