NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
49 stars 36 forks source link

[BUG] Handle different exception thrown by incomplete eventlogs #1122

Closed amahussein closed 3 months ago

amahussein commented 3 months ago

Describe the bug

With https://github.com/NVIDIA/spark-rapids-tools/pull/686 , the incomplete evetnlogs would throw JsonEOFException. However, there is an eventlog sample that is incomplete and Spark would instead throw a JsonParseException causing the entire analysis to be skipped.

Upon investigation, I found that there are some cases that causes different failure.

To reproduce, I truncated TaskEnd event line and triggered Profiler tool.

24/06/14 11:55:09 ERROR EventUtils: Log parse exception
com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input within/between Object entries

As a result, the profiler shows that there is a JSONParseException and set teh status to UNKNOWN. Instead, we can try to handle that corner case in order to avoid skipping all incompletefiles.