FabricMC / fabric-loom

Gradle build system plugin used to automate the setup of a minecraft mod development environment.
MIT License
221 stars 193 forks source link

The `runDatagen` task can never be cached by gradle, as it depends on its own outputs #1120

Open solonovamax opened 1 month ago

solonovamax commented 1 month ago

The run datagen task can never be cached by gradle, as its inputs will always depend on its outputs.

Reproducing:

  1. Create simple fabric mod with data generation enabled
  2. Add the following code (kotlin; too lazy to port to groovy. the steps to reproduce still match up without this code, this code just helps understand it)

    tasks {
       val runDatagen by named("runDatagen") {
           doFirst {
               println(inputs.files.files)
           }
       }
    
       processResources {
           doFirst {
               println(inputs.files.files)
               println(outputs.files.files)
           }
       }
    }
  3. Run the runDatagen task several times without changing anything.

Expected:

All runs after the first are cached.

Actual:

None of the runs are cached.


Using the print statements, you can see that the runDatagen task depends on its own outputs in the following manner:

Since processResources is never cached and always re-run, this causes runDatagen to always be re-run. This is quite annoying for people who don't want to check src/main/generated/* into VCS, and instead have their jar task depend on runDatagen (to ensure the generated resources always remain up to date), as it will cause runDatagen to constantly be re-run, even when no changes have been made.

modmuss50 commented 1 month ago

Do you have an idea on how to fix this? I dont think there is a solution as unfortunately there is a funimental circular dependency issue with datagen.

Datagen requires the mod to be built to run, but the mod requires datagen to be ran to build. By default we suggest mod devs manually run datagen to work around this.

solonovamax commented 1 month ago

I was going to suggest manually adding the resources to the classpath of the required Jar and RunGameTask tasks, however I realized that you're setting the resources via SourceSet#resources, so this isn't particularly the best idea, for as long as the intent is to have the code for generating the resources be in the same source set as the code for the mod.

In the past, I believe I've seen similar things where a new source set is created, named smth like datagen, so new directories src/datagen/resources/, src/datagen/java/, src/datagen/kotlin/, etc. are introduced.

Looking at how ksp deals with smth similar, it seems that what they do is they filter the inputs to exclude any of the output directories. I would recommend probably doing something similar.

although, do keep in mind that technically this can lead to invalid results, if, for whatever reason, the classes that do the data generation depend on some of the files they generate. (eg. attempt to load them and then use that to determine what is generated), then this can lead to incorrectly marking it as not needing to be re-run, when it does actually need to be re-run.

Ampflower commented 1 week ago

I have noticed that inputs also depends on arguments containing a practically guaranteed to be random field, /tmp/loom-classpath17467358589055212817.args. I believe this also causes caching to be practically invalid as the hash will never match, even if you otherwise attempt to remove the reciprocal dependency. It'd be ideal if at all possible to make the classpath's filename a stable hash dependent on inputs and just hold a lock from deletion in the case of competing Looms.

This is shown by --debug as

[INFO] [org.gradle.internal.execution.steps.SkipUpToDateStep] Task ':xplat:runDatagen' is not up-to-date because:
  Value of input property 'jvmArgs' has changed for task ':xplat:runDatagen'

Disclaimer; I have found this through archloom, although if it is present in upstream (here), it would be contributing to the cache invalidation, which it seems to be here: https://github.com/FabricMC/fabric-loom/blob/097fd98fc91fdbe9ec42f2304991b68bb29751e4/src/main/java/net/fabricmc/loom/task/AbstractRunTask.java#L127-L133

modmuss50 commented 1 week ago

I have noticed that inputs also depends on arguments containing a practically guaranteed to be random field, /tmp/loom-classpath17467358589055212817.args. I believe this also causes caching to be practically invalid as the hash will never match, even if you otherwise attempt to remove the reciprocal dependency. It'd be ideal if at all possible to make the classpath's filename a stable hash dependent on inputs and just hold a lock from deletion in the case of competing Looms.

This is shown by --debug as

[INFO] [org.gradle.internal.execution.steps.SkipUpToDateStep] Task ':xplat:runDatagen' is not up-to-date because:
  Value of input property 'jvmArgs' has changed for task ':xplat:runDatagen'

Disclaimer; I have found this through archloom, although if it is present in upstream (here), it would be contributing to the cache invalidation, which it seems to be here:

https://github.com/FabricMC/fabric-loom/blob/097fd98fc91fdbe9ec42f2304991b68bb29751e4/src/main/java/net/fabricmc/loom/task/AbstractRunTask.java#L127-L133

Oh wow, nice job spotting that. That can indeed be fixed! Would you mind opening an issue on the loom repo? One thing to keep in mind is we wouldn't want to start skipping the normal run game tasks if their inputs havent changed, only the data generation tasks.