Is your feature request related to a problem? Please describe.
We have some event logs from cpu and gpu event logs on Databricks where the SQL Ids do not line up to make them comparable. After investigation I found that most of the issues were due to delta log metadata reads. This includes delta checkpoint files, the delta_log json files and dealing delta caching stuff.
This adds a new option to the profiler tool: --output-sql-ids-aligned that causes the tool to ouput a new table and optionally csv file, that strips out the sqlids of all the delta log related things. The table simply has appId and sqlIds in sorted order.
Is your feature request related to a problem? Please describe.
We have some event logs from cpu and gpu event logs on Databricks where the SQL Ids do not line up to make them comparable. After investigation I found that most of the issues were due to delta log metadata reads. This includes delta checkpoint files, the delta_log json files and dealing delta caching stuff.
This adds a new option to the profiler tool: --output-sql-ids-aligned that causes the tool to ouput a new table and optionally csv file, that strips out the sqlids of all the delta log related things. The table simply has appId and sqlIds in sorted order.