daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
64 stars 57 forks source link

Split generated kernels.cpp for faster builds #516

Open pdamme opened 1 year ago

pdamme commented 1 year ago

tba

philipportner commented 4 months ago

FYI: I'm working on this one.

philipportner commented 4 months ago

I'll post this as an information dump for now:

ninja provides a log file which can be converted to chrome tracing format with the tool ninjatracing. Using ui.perfetto these can be simply viewed in the browser. I looked at this trace, the biggest offender is the kernels.cpp file mentioned in this issue, the second largest contribution to compile time is compilation of DaphneDialect.cpp.

As the granularity from the ninja log is not that helpful, I tried to use -ftime-trace from clang. As compilation with clang does somewhat work even thought execution fails, the time trace can still be extracted. With that, we have some profiling information about why individual compilation units take long.

kernels.cpp frontend takes roughly 25% of the time, with the backend taking the other 75%. Roughly half of the frontends time is spent doing template instantiations. Here the biggest offender is Eigen::EigenSolver<Eigen::Matrix<T with two types, double and float. The backend does not appear to have a single source of time spent, rather a lot of function-level optimization passes + codegen passes are performed.

Overall it looks like splitting kernels.cpp into multiple compilation units should do the trick for kernels.cpp.

Looking at the fine-grained -ftime-trace of DaphneDialect.cpp I believe that we should do the same there.

To reproduce the traces:

ninja_log_trace.json DaphneDialect.cpp.json kernels.cpp.json