Please consider adding `-j` to generate_source.py

ShabbyX commented 6 months ago

VVL does a lot of autogen, which takes quite a bit of time. This time can be drastically reduced if autogen is done in parallel as all(?) targets are independently generated. A -j argument could be given a number of threads, or otherwise default to multiprocessing.cpu_count().

Context: working on a release script for gitlab CI, all other repos finish their autogen quite quickly, but VVL's autogen adds ~35s to runtime. Making this faster can reduce CI time.

spencer-lunarg commented 6 months ago

I added this a while ago (but haven't had time to test it so its off by default) https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/7613

if you add --caching to the end of generate_source.py it will turn on

spencer-lunarg commented 6 months ago

Also I have looked into this before the 3 main bottle necks are

Parsing the vk.xml each time (which is where the --caching above helps
Clang-format at the end
File I/O

ShabbyX commented 6 months ago

Good to know about --caching, but I'll probably defer using that until you can verify it doesn't mess things up.

Am I right that generating each file is independent? Basically that is the body of the for loop in generate_source.py could just be moved to threads and there are no races? That'll be a quick win even if each thread is doing redundant work (which can later be optimized).

spencer-lunarg commented 6 months ago

yes, they are independent

spencer-lunarg commented 6 months ago

Good to know about --caching, but I'll probably defer using that until you can verify it doesn't mess things up.

I plan to do it this week, wanted to get the SDK out first

spencer-lunarg commented 5 months ago

So I tried to add this simple patch

diff --git a/scripts/generate_source.py b/scripts/generate_source.py
index 1e1ba67b6..4b82f7db5 100755
--- a/scripts/generate_source.py
+++ b/scripts/generate_source.py
@@ -28,6 +28,7 @@ import difflib
 import json
 import common_ci
 import pickle
+import concurrent.futures
 from xml.etree import ElementTree
 from generate_spec_error_message import GenerateSpecErrorMessage

@@ -323,7 +324,7 @@ def RunGenerators(api: str, registry: str, grammar: str, directory: str, styleFi
     if caching:
         EnableCaching()

-    for index, target in enumerate(targets, start=1):
+    def process_target(index, target):
         print(f'[{index}|{len(targets)}] Generating {target}')

         # First grab a class contructor object and create an instance
@@ -376,6 +377,9 @@ def RunGenerators(api: str, registry: str, grammar: str, directory: str, styleFi
         if has_clang_format:
             common_ci.RunShellCmd(f'clang-format -i --style=file:{styleFile} {os.path.join(directory, target)}')

+    with concurrent.futures.ThreadPoolExecutor() as executor:
+        executor.map(process_target, range(1, len(targets) + 1), targets)
+
     if os.path.isfile(cachePath):
         os.remove(cachePath)

and results

caching single thread = 21 seconds
caching multi-thread = 20 seconds (doesn't print out warnings!)
no-caching multi thread = 54 seconds

@ShabbyX I think this shows threading doesn't really buy us much, the caching has already reduced the time and adding threading on top of it for a 1 second save is not worth it as we rely on the warning messages for people so they know why some new header might have broke a certain file

ShabbyX commented 5 months ago

:+1:

For me, the numbers look like this (beefy work computer):

Base: 40s
Caching: 24s~25s
Caching and parallel: 17s~19s

But lack of logs is a deal breaker of course, and the gains are not that significant probably because of GIL (besides, the CI machines won't be beefy anyway). So, agreed, the way things are currently, parallel does not provide much benefit.

KhronosGroup / Vulkan-ValidationLayers

Please consider adding `-j` to generate_source.py #7701