broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.64k stars 581 forks source link

Question: could it be possible to change the logging to sfl4j? #2176

Closed magicDGS closed 7 years ago

magicDGS commented 7 years ago

As explain in the documentation. for logging GATK is using org.apache.logging.log4j.Logger.

Nevertheless, because I would like to use GATK4 as a framework, I think that API users could benefit from using SLF4J as a plugin system for allow users to decide which system use.

Because it exists a wrapper for log4j, I think that this could be done easily without changing any behaviour or configuration. In addition, I believe that the usage of SLF4J is similar as the one used in log4j.

magicDGS commented 7 years ago

As I commented in #2190 (and @tomwhite agrees), this could help on the serialization problems in Spark. In addition, it will be interesting from the point of view that I described in the original comment: more flexibility of GATK as a framework.

What do you think, @droazen?

droazen commented 7 years ago

@lbergelson Opinion on this?

lbergelson commented 7 years ago

@droazen I don't really have much of an opinion about the loggers. I think we chose log4j2 pretty arbitrarily and because it seemed popular. We don't have very complex logging needs so any logger pretty much satisfied our needs.

If there's a good reason to switch to SLF4J with a log4j backend that seems fine to me. It makes sense to use the more general solution so that people can use whatever backend they want.

I do know that we've encountered a lot of issues with having multiple copies of different logging frameworks included as transitive dependencies on spark. So there may be some hassle switching over, we'd have to run tests on google cloud to make sure we don't start crashing all of a sudden for logger related classpath issues.

@magicDGS I don't think anyone here cares that much about which logger we're using as long as it doesn't get in our way. If you want to do the switchover I think we'd be happy to accept a pull request as long as it didn't cause problems with our spark tools. Are you able to run our spark tools on gcloud dataproc?

magicDGS commented 7 years ago

I tried to do it, and I'm afraid that it won't be trivial as I expected: because it is a facade, there is not accesibility to Logger.setLogLevel() and this is required to set the verbosity level in the command line.

After explore a bit the code, it seems that LoggingUtils is the only place where a concrete implementation should be used.

My suggestion is to move this class to a package that could be excluded by the backend user (because it contains methods to change the logging of log4j, I suggest org.apache.logging.log4j), which implements a simple interface/abstract class org.broadinstitute.hellbender.utils.LoggingUtils to set the log level (LoggingUtils.setLoggingLevel(final Log.LogLevel verbosity). The default implementation (that could be used by final users callidsuper.setLoggingLevel(final Log.LogLevel verbosity)`) could setup the htsjdk and the java.util.logger.Logger.

This implementation requires to change the CommandLineProgram to have a setter for the LoggingUtils to use, that could be set in Main (as in my PR for improve the extensibility of this class). The only pronblem is that it requires to be initialize with a simple implementation class of LoggingUtils, which should use the default.

I think that this design does not break the behaviour of GATK, but introduce more complexity in the code. If you think that this is worthy, I could implement it today. @lbergelson, I'm not able to run Spark tools in a cluster yet, neither in gcloud dataproc, sorry. I'll wait for your answers on this.

lbergelson commented 7 years ago

@magicDGS That sounds like a solution that could work, but it does add complexity.

If it's going to be a more complicated change it seems like there have to be some concrete advantages, and I don't know enough about it to know what those are. Could you describe some of the advantages of switching to SLF4J? Are there specific features that you want to use that aren't available in the current logger?

magicDGS commented 7 years ago

The advantage of using SLF4J is that it is a general facade, so it makes simpler to change for one logging system to other if the bound is implemented. For the most common logging systems (log4j, jul, JLC, etc.), there are this implementation and even no-op logging. One of the nice things from slf4j is that it allows to use the logging format set by the software to every library dependency, controlling the verbosity of other libraries too.

After having a look to the gradle dependencies, it seems that ADAM and Spark use slf4j. This will allow better integration with the two libraries: now the slf4-jdk is completely removed, and I don't know if this will blow up at some point if some of the ADAM/Spark classes try to load them. In addition, it will make GATK4 more general.

Regarding features, I'm not using more that what log4j is providing, but I'm quite familiar with logback and I have a bias to use it if possible, but the GATK framework as it is implemented now "force" to use log4j.

But anyway, I'm happy also with using log4j and I was only suggesting this for make GATK4 more general (and to come back in my work to logback, but that is just personal taste).

@lbergelson, feel free to close the issue.

heuermh commented 7 years ago

While a minor issue, moving to slf4j-api as the logging API and slf4j-log4jxx as the implementation would help with interoperability.

For example, I'm working on a proposal for a separate library for conversions between formats, and currently the API passes in a slf4j Logger and an enum similar to ValidationStringency.

lbergelson commented 7 years ago

@magicDGS @heuermh It sounds like there are definite advantages to switching to SLF4j. Our fear is that the switch will end up resulting in confusing class loading issues in different spark environments. We're just beginning a round of spark performance testing and evaluation under a pretty tight deadline, and we don't want to introduce any surprises.

We'd be happy to look at a pull request, but we might delay until the end of quarter to fully test / merge it.

lbergelson commented 7 years ago

The only real requirement we have for logging is the ability to control the verbosity of the logging from the command line. We don't typically use any more advanced features.

droazen commented 7 years ago

Closing -- we're not changing our logger.