[FEA] Tools should Identify the delta log operations and generate views for non-delta logs

Is your feature request related to a problem? Please describe.

We have some event logs from cpu and gpu event logs on Databricks where the SQL Ids do not line up to make them comparable. After investigation I found that most of the issues were due to delta log metadata reads. This includes delta checkpoint files, the delta_log json files and dealing delta caching stuff.

This adds a new option to the profiler tool: --output-sql-ids-aligned that causes the tool to ouput a new table and optionally csv file, that strips out the sqlids of all the delta log related things. The table simply has appId and sqlIds in sorted order.

NVIDIA / spark-rapids-tools

[FEA] Tools should Identify the delta log operations and generate views for non-delta logs #1023