Closed ricetwice closed 3 months ago
Can you provide an example of the code output from two runs, one showing a cold start (Warp cache empty) and one from a subsequent run that demonstrates the issue? Ideally, we would like an example script we could run that illustrates the issue.
This function gives an overview of how the module hash is calculated: https://github.com/NVIDIA/warp/blob/main/warp/context.py#L1554-L1683 Structs, kernels, functions, and wp.constants all go into computing a module hash, as well as the hashes of any modules that the current module references.
For 2, a simple example I can think of is if you define a wp.constant
using a random number generator that changes value on every run (e.g. no fixed seed). Since the values of wp.constant's get added to the module hash, the hash would change every time you run, and forces the recompilation of both this module and any modules that reference it. This problem used to be a lot worse in older version of Warp in which every module loaded at runtime would have its hash affected by the wp.constant
variables declared in the program, which ended up in a lot of unnecessary recompilation if the set of wp.constant
variables was changing between runs.
Another thing that used to affect 2 was the declaration of additional kernels at runtime or inside functions. However, this issue was also addressed a few releases ago by maintaining multiple cache directories for the same module name. Previously, we would only keep a single set of files for a module in the cache directory, so if the module hash changed, we would delete the files associated with the old hash and regenerate the files for the new hash.
Hi @shi-eric,
Thanks for your reply! After stepping through the debugging process of the hash calculation for a module, I discovered an issue during the hash update process according to Kernels, specifically in the following lines of code:
for arg, arg_type in kernel.adj.arg_types.items():
s = f"{arg}: {get_type_name(arg_type)}"
ch.update(bytes(s, "utf-8"))
If one of the arguments in the kernel is an array of user-defined Structs, the string s
representing the argument takes the form:
'arg: array<warp.codegen.Struct object at 0xXXXXXXX>'
The address at the end of this string changes with each run, even when the code remains unmodified. Consequently, the hash of a module with certain kernels having arguments of this type changes every run, which triggers unnecessary recompilations.
I believe this is a bug and should be addressed.
Thank you @ricetwice for isolating the problem! I'll update you when we have a fix.
Hey @ricetwice, a fix has been pushed to the main
branch for this issue. Thanks again for reporting it!
I am encountering an issue with the Warp framework where certain modules are being recompiled every time I run my program, despite no changes being made to the code. Specifically, these modules show different hash values in the loading information with each run. Meanwhile, other modules were successfully loaded from the cache.
This recompilation process is quite time-consuming and significantly impacts the startup time of my program. I am seeking clarification on the following points:
Any guidance or insights you could provide would be greatly appreciated.