AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
231 stars 76 forks source link

Container exited with a non-zero exit code 137 | Out of memory #348

Closed Vivek-Merugu closed 10 months ago

Vivek-Merugu commented 1 year ago

I am encountering an issue while using the ABRiS library version 6.3.0 in my Spark Streaming Scala code.

I have attempted to address the issue by increasing the driver memory and executor memory, but unfortunately, the problem persists.

Any guidance or assistance you can provide in resolving this issue would be greatly appreciated.

cerveada commented 12 months ago

What makes you believe that this issue is caused by Abris?

Vivek-Merugu commented 12 months ago

Thank you for your response. I appreciate your assistance in addressing this issue. Here's a breakdown of the situation:

Old Build Overview:

New Build Overview:

Error in New Build:

After the initial working period, the streaming application encountered the following error:

ERROR Client: Application diagnostics message: Application application_1696171462852_0173 failed 2 times due to AM Container for appattempt_1696171462852_0173_000002 exited with exitCode: 137
Failing this attempt.Diagnostics: Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137.
Killed by external signal

Investigation Steps:

Specific Queries:

Your insights and suggestions on resolving this matter would be greatly appreciated.

cerveada commented 12 months ago

Abris doesn't really works with json, it converts from avro to spark DataFrame. So it could also be some issue with json.

Best would be to use some profile to see what is really happening in the memory.

One Idea I have is the registryConfig. Do your application uses avro with many different registryConfig maps? That could cause a memory leak since the registry clients are cached for each config.

Vivek-Merugu commented 12 months ago

ABRiS Library Usage:

RegistryConfig in Use:

Memory Monitoring:

Vivek-Merugu commented 10 months ago

The problem happened because there wasn't enough memory allocated to the Spark executor. After increasing the Spark executor memory, the issue was resolved.

Hence, closing this issue.