Open thomasfrosio opened 1 year ago
After looking at it more closely, this may be dangerous with the --cuda-std
option on.
In this scenario, passing <cuda/std/*>
headers in header_sources
will call the patch_cuda_source
function on these headers, which I'm pretty sure, is not something we want.
This happens here: https://github.com/NVIDIA/jitify/blob/2a7d7543933fe48a82fdb1ef7ff2eac3fdd96d86/jitify2.hpp#L5973-L5983
These two lines seem problematic, or at least I don't understand it:
Is that supposed to detect <cuda/std/*>
headers?
However, when loading new headers, everything seem okay and <cuda/std/*>
headers are not patched thanks to these lines:
https://github.com/NVIDIA/jitify/blob/2a7d7543933fe48a82fdb1ef7ff2eac3fdd96d86/jitify2.hpp#L6119-L6123
Not sure if I'm misusing the library or if it's a bug.
I like this idea, but I think you're right that we need to be careful to avoid patching cuda/std headers or double-patching other headers.
bool is_cuda_std_header =
detail::get_workaround_system_headers().count(header_name);
I think the logic here is correct (because the workaround system headers are added via header_sources
), but the variable name is wrong (I probably made a copy-paste mistake). If cuda/std headers are also passed via header_sources
then we will need to somehow detect them here too. Double-patching is also a potential problem, though it may already be idempotent so it wouldn't really matter besides the performance impact.
Maybe you could make this op-in via a command line flag like --experimental-reuse-headers
or similar, and also only enable it if --cuda-std
is not enabled.
Note that #131 will dramatically speed up preprocessing (orthogonal to this PR) once it lands. I will think about how we can enhance the new preprocessing logic to detect headers that we don't want to (re-)patch.
Instead of preprocessing every source file from scratch, why not use the already loaded headers?
Please correct me if I'm wrong.
Currently,
jitify
loads and preprocesses the source files in a distributed approach, i.e. each file is preprocessed independently from the others, and the loaded headers are then added to theall_header_sources
map.std::unordered_map::insert()
will automatically filter redundant entries on a first-in first-out basis.This is all fine, but since the files are preprocessed one after the other, why not instead add the already known headers from the previous files to the list of headers that
nvrtc
can use for the next file? As I'm sure you've noticed, lettingjitify
collecting the headers fromnvrtc
's error messages quickly adds up when there's a lot of headers. For files that include a lot of the same headers (which I assume is a very common scenario), this simple change makes a big difference performance-wise and should produce exactly the same output, right? I use--shared-headers
option and it seems to work fine.HTH, Thomas