Open tgravescs opened 1 year ago
Note, if I try this now I get:
23/06/01 13:22:31 WARN Profiler: Exception occurred processing file: eventlog-2023-06-01--12-00
java.lang.NullPointerException
at com.nvidia.spark.rapids.tool.profiling.CollectInformation.$anonfun$getAppInfo$1(CollectInformation.scala:38)
at scala.collection.immutable.List.map(List.scala:293)
at com.nvidia.spark.rapids.tool.profiling.CollectInformation.getAppInfo(CollectInformation.scala:36)
at com.nvidia.spark.rapids.tool.profiling.Profiler.com$nvidia$spark$rapids$tool$profiling$Profiler$$processApps(Profiler.scala:288)
at com.nvidia.spark.rapids.tool.profiling.Profiler$ProfileProcessThread$1.run(Profiler.scala:230)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:75
```0)
I expect that to be tricky.
There are some issues that could be related to that as well:
Is your feature request related to a problem? Please describe. For 24/7 type clusters, the event logs can be huge so loading everything in Profiling tool is impossible. It would be nice to allow it to work with partial event logs where it might just be a smaller period of time like an hour.
This is usually combined with eventlog rolling, like every hour or so