Kotlin / dokka

API documentation engine for Kotlin
https://kotl.in/dokka
Apache License 2.0
3.37k stars 397 forks source link

Use Gradle Worker API #2903

Open vmishenev opened 1 year ago

vmishenev commented 1 year ago

WIP

Motivation

  1. Currently, Dokka reloads a big classpath for every task. Also, it has reflection magic to support different versions of plugins.
  2. Dokka works parallel badly. For example, the run time on on a project with 100 tasks is 8 minutes.

Proposal

Gradle Worker API can help Dokka to avoid it. But we have 2 options for using it:

  1. Use cached classpath and noIsolation mode (to keep a classpath in a static variable that is shared between tasks). This approach is used in Kapt.
  2. Use processIsolation mode. Dokka task will be executed in worker daemon processes. The running processes with the same classpath can be reused for other tasks. In this case, the classpath is loaded once per process.

We need to choose only one option. We already have prototypes. @aSemy created one for the second option and there is a prototype for the cached classpath here.

Pros&cons

There are other points for choosing an approach. Everybody can share their opinion about it here.

To sum up, I personally vote for the first option. In the case of Dokka, we will increase little bit of time building but stability is more important than performance.

aSemy commented 1 year ago

There's another isolation mode too, classLoaderIsolation(). It's a halfway point between the other two, and I think it's worth considering. I strongly suspect that it would be the most performant of the 3 options.

One benefit to processIsolation is that it prevents problems with coroutines, and so hopefully it would make finalizeCoroutines option obsolete.

https://github.com/Kotlin/dokka/blob/14c05d70b52814fe48e930b3f61fed5e8586718c/core/src/main/kotlin/configuration.kt#L139-L156

I think that whatever option is chosen, I think #2740 will be broadly the same.

TWiStErRob commented 1 year ago

@aSemy What is the reason for Dokka tasks executing one by one, one after the other in a Gradle parallel build? When I do a publish I can see compilation and all other tasks complete very quickly, then all workers (28+) are busy with different Dokka tasks from submodules. The console logs are suggesting that one needs to complete before the next one can start.

aSemy commented 1 year ago

@aSemy What is the reason for Dokka tasks executing one by one, one after the other in a Gradle parallel build? When I do a publish I can see compilation and all other tasks complete very quickly, then all workers (28+) are busy with different Dokka tasks from submodules. The console logs are suggesting that one needs to complete before the next one can start.

DGP (Dokka Gradle Plugin) is not compatible with many Gradle features like project isolation, build cache, configuration cache - see #2700 - so tasks run sequentially, even across different subprojects. And DGP does not use the Worker API at the moment (hence this issue and my PR #2740).

If you want to generate Dokka docs faster then look at Dokkatoo. Dokkatoo is a re-implemented DGP that supports all the speedy Gradle features. Dokkatoo is not a drop-in replacement for DGP, but it's pretty similar, and you run add both DGP and Dokkatoo in the same project to verify that the output of both is identical.

(Funny that you've pinged me (just a contributor) rather than the actual maintainers @IgnatBeresnev and @vmishenev 😄)

TWiStErRob commented 1 year ago

I pinged you exactly because of all those issues, PRs and repo you linked. I know DGP is not compatible with fancy new features, but org.gradle.parallel is a pretty pretty old feature. I'm wondering what is locking inside Dokka that doesn't allow basic parallelism.

aSemy commented 1 year ago

I pinged you exactly because of all those issues, PRs and repo you linked. I know DGP is not compatible with fancy new features, but org.gradle.parallel is a pretty pretty old feature. I'm wondering what is locking inside Dokka that doesn't allow basic parallelism.

Ah okay. Hmm, I'm not sure I can give a definitive answer because I'm not exactly sure how --parallel works and what the requirements are, and precisely what DGP is doing that would prevent it, but these items in particular are related:

TWiStErRob commented 1 year ago

Thanks for the pointers!

martinbonnin commented 7 months ago

+1 for classloaderIsolation or at least making the isolation configurable. In addition to performance, classloaderIsolation makes it way easier to debug the builds.

adam-enko commented 3 months ago

One potential benefit of process isolation is that it would allow for class data sharing. With a multimodule project, Dokka Generator has to run multiple times with the same classpath. The Dokka classpath can be quite large (the analyzer component is ~80MB). Using CDS would mean that the classes could be 'cached' between generations, improving startup time and reducing memory usage.