Reduce metrics collection overhead

What is the problem the feature request solves?

Running TPC-H locally, I see >3% of on-CPU time spent in comet::execution::metrics::utils::update_comet_metric. This function appears to be called whenever native execution wakes up in the polling loop, typically to produce a batch. Starting from the root of the plan, the preorder traversal behavior is:

For every metric in the node:
- JNI call to allocate a string for the metric's name
- JNI call to update the metric using the string as the key
For every child in the node:
- JNI call to fetch child metric node
- Recursively call update_comet_metric

Describe the potential solution

There are a few things to explore:

Does reducing the granularity of metrics updates affect the correctness of these metrics? If not, we could update metrics less frequently.
~~Can we eliminate the overhead of repeatedly allocating strings via JNI for every metric?~~ Addressed in #1029.
Can we update an entire node's metrics with a single JNI call, rather than a JNI call for each metric?

Additional context

No response

apache / datafusion-comet

Reduce metrics collection overhead #1024

What is the problem the feature request solves?

Describe the potential solution

Additional context