Open aws-jiadingg opened 2 months ago
add -DCUTLASS_NVCC_KEEP=1
to your cmake
Thanks a lot! It appears that -DCUTLASS_NVCC_KEEP=1 only directs nvcc
to output all of its intermediate steps. If I want to dig deep into the compilation process of cudafe++ (invoked by nvcc), and given that cudafe++ parses CUTLASS templates using the --parse_templates
option, is there a way to make cudafe++ dump the intermediate representation (IR) immediately after parsing these templates? cudafe++ --help
did not print any usage instructions ...
cudafe++ --c++17 --gnu_version=130200 --display_error_number --orig_src_file_name flash-attention/csrc/flash_attn/src/flash_fwd_hdim96_bf16_causal_sm80.cu --orig_src_path_name flash-attention/csrc/flash_attn/src/flash_fwd_hdim96_bf16_causal_sm80.cu --allow_managed --extended-lambda --relaxed_constexpr --m64 --parse_templates --gen_c_file_name flash_fwd_hdim96_bf16_causal_sm80.compute_90.cudafe1.cpp --stub_file_name flash_fwd_hdim96_bf16_causal_sm80.compute_90.cudafe1.stub.c --gen_module_id_file --module_id_file_name flash_fwd_hdim96_bf16_causal_sm80.module_id flash_fwd_hdim96_bf16_causal_sm80.cpp4.ii
@hwu36 do you know the answer?
Talked with the fe team, if you turn on DCUTLASS_NVCC_KEEP, it will generate .cpp4.ii and .cudafe1.cpp files, that is likely the best you can do.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
What is your question? Hi, is there a way to dump all the IRs after each pass of
cudafe++
andcicc
? e.g., the output IR generated right aftercicc
parsing the CUTLASS templates. Thanks!