KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
748 stars 400 forks source link

Please consider adding `-j` to generate_source.py #7701

Closed ShabbyX closed 5 months ago

ShabbyX commented 6 months ago

VVL does a lot of autogen, which takes quite a bit of time. This time can be drastically reduced if autogen is done in parallel as all(?) targets are independently generated. A -j argument could be given a number of threads, or otherwise default to multiprocessing.cpu_count().

Context: working on a release script for gitlab CI, all other repos finish their autogen quite quickly, but VVL's autogen adds ~35s to runtime. Making this faster can reduce CI time.

spencer-lunarg commented 6 months ago

I added this a while ago (but haven't had time to test it so its off by default) https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/7613

if you add --caching to the end of generate_source.py it will turn on

spencer-lunarg commented 6 months ago

Also I have looked into this before the 3 main bottle necks are

  1. Parsing the vk.xml each time (which is where the --caching above helps
  2. Clang-format at the end
  3. File I/O
ShabbyX commented 6 months ago

Good to know about --caching, but I'll probably defer using that until you can verify it doesn't mess things up.

Am I right that generating each file is independent? Basically that is the body of the for loop in generate_source.py could just be moved to threads and there are no races? That'll be a quick win even if each thread is doing redundant work (which can later be optimized).

spencer-lunarg commented 6 months ago

yes, they are independent

spencer-lunarg commented 6 months ago

Good to know about --caching, but I'll probably defer using that until you can verify it doesn't mess things up.

I plan to do it this week, wanted to get the SDK out first

spencer-lunarg commented 5 months ago

So I tried to add this simple patch

diff --git a/scripts/generate_source.py b/scripts/generate_source.py
index 1e1ba67b6..4b82f7db5 100755
--- a/scripts/generate_source.py
+++ b/scripts/generate_source.py
@@ -28,6 +28,7 @@ import difflib
 import json
 import common_ci
 import pickle
+import concurrent.futures
 from xml.etree import ElementTree
 from generate_spec_error_message import GenerateSpecErrorMessage

@@ -323,7 +324,7 @@ def RunGenerators(api: str, registry: str, grammar: str, directory: str, styleFi
     if caching:
         EnableCaching()

-    for index, target in enumerate(targets, start=1):
+    def process_target(index, target):
         print(f'[{index}|{len(targets)}] Generating {target}')

         # First grab a class contructor object and create an instance
@@ -376,6 +377,9 @@ def RunGenerators(api: str, registry: str, grammar: str, directory: str, styleFi
         if has_clang_format:
             common_ci.RunShellCmd(f'clang-format -i --style=file:{styleFile} {os.path.join(directory, target)}')

+    with concurrent.futures.ThreadPoolExecutor() as executor:
+        executor.map(process_target, range(1, len(targets) + 1), targets)
+
     if os.path.isfile(cachePath):
         os.remove(cachePath)

and results


ShabbyX commented 5 months ago

:+1:

For me, the numbers look like this (beefy work computer):

But lack of logs is a deal breaker of course, and the gains are not that significant probably because of GIL (besides, the CI machines won't be beefy anyway). So, agreed, the way things are currently, parallel does not provide much benefit.