TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

Switch to a Python library for logging #1512

Closed laneb closed 5 years ago

laneb commented 5 years ago

When I added logging to SMV earlier this year, I used log4j since it was readily available and could be accessed both Scala and from Python (using Py4J). After discussing performance issues and optimizations re: graph generation with @AliTajeldin, I wanted to see where our bottleneck(s) are. It turns out logging through the JVM is a big bottleneck - dropping in a Python standard logger in place of Py4J for logging from the Python side of the application improved introspection performance by an order of magnitude. When running a trivial module in a 1000 module project, the running time is about 15s and is dominated introspection. After replacing the logging, the running time drops to about 3s. So dropping log4j will significantly speed up introspection-dominated tasks.