criteo / babar

Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
Apache License 2.0
125 stars 29 forks source link

Can't aggregate babar.log to yarn logs #17

Open andres-lago opened 6 years ago

andres-lago commented 6 years ago

Hello, I can't get the babar.log aggregated to yarn logs. When I run the command to get the logs: yarn logs --applicationId application_XXXX_YYYY > myAppLog.log

the resulting myAppLog.log doesn't contain the traces of babar.log. It contains only my application and yarn log messages.

It is not a problem when I launch the application from the linux command line (calling directly spark2-submit) because the file babar.log is created in the same directory. But when I launch it from an oozie workflow (production environment), the file babar.log dissapears when the container terminates and its content is not aggregated.

I realized that the system properties: yarn.app.container.log.dir and spark.yarn.app.container.log.dir are null, then babar uses a local directory where log is stored : ./log. Could it be the reason? Anyone has observed the same problem?

BenoitHanotte commented 6 years ago

Hello! Yes it could be the problem, the agent uses the environment properties to find where to store the log file. It the properties are not found, it will store to a local folder named log. If you tried to replace the environemnt property by getting the value from an initialized hadoop configuration, would you get a correctly set value? Something like:

Configuration conf = new Configuration();
conf.get("yarn.app.container.log.dir")

You could also try to specify a custom log directory from the agent parameters that you add to your java options.

andres-lago commented 6 years ago

Hi Benoit, thanks for your support. We've chosen the easiest solution, write the babar.log to a local directory outside the container (/tmp) and collect it afterwards manually from the concerned server (driver's server in our case). I couldn't find an easy solution to get the property value (directory of yarn containers' logs) and pass it to babar before launching spark2-submit from an oozie workflow.

We activated babaronly for the driver, we didn't arrive to launch it in the executors. But as we're actually working in a problem with the driver, then it's enough by now.