apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.59k stars 1.41k forks source link

Deprecate Log and move to SLF4J Logger #1460

Open asfimport opened 8 years ago

asfimport commented 8 years ago

The current Log class is intended to allow swapping out logger back-ends, but SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, which can handle formatting to avoid the cost of building log messages that won't be used. I think we should deprecate the org.apache.parquet.Log class and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).

This will require deprecating the current Log class and replacing the current uses of it with SLF4J.

Reporter: Ryan Blue / @rdblue

Related issues:

Note: This issue was originally created as PARQUET-401. Please see the migration documentation for further details.

asfimport commented 8 years ago

Michal Turek / @mixalturek: Hi, our libraries use generic dependency slf4j-api and applications concrete logback. I started using parquet-avro library in our application and found a possibly related issue that is not so obvious. So am adding a comment for considering during the refactoring.

        <dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-avro</artifactId>
            <version>1.8.1</version>
        </dependency>

The following lines are appearing in stderr on close of Parquet file (ParquetWriter<GenericRecord>).

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

After a few hours of debugging and frustration I found the following... parquet-avro depends on org.apache.parquet:parquet-format:2.3.0-incubating which is shadowing slf4j library. This means that Java CLASSPATH contains everything from slf4j twice - for example standard one org.slf4j.LoggerFactory and shadow parquet.org.slf4j.LoggerFactory.

The standard implementation is able to find logback and properly configure logging for the application code using logback.xml, the second one fails and displays the error log above (including improper package name).

After finding this it was easy to put a breakpoint to the correct place and find the caller, which is parquet.org.apache.thrift.transport.TIOStreamTransport.

I found commit https://github.com/apache/parquet-format/commit/145f4df56bacd388b77ac9ff62d0b7c3c1b4eab9 and PARQUET-369 that already tries to solve this issue, but I don't think that adding shaded dependency to slf4j-nop is correct. This only disables logging for this library, including warning/error log message from TIOStreamTransport.close(). Very important information is effectively lost in such case.

Please consider to implement logging properly and depend only on non-shadowed slf4j-api and let the users of your libraries to decide which log messages to see and which not. I can attach an example source code if you want, but I think the issue is quite obvious. Please let me know in case of any questions.

asfimport commented 8 years ago

Ryan Blue / @rdblue: [~turek@avast.com], thanks for bringing this up. It sounds like the issue you're raising about the move to use the SLF4J API, but is about PARQUET-369. I just want to make sure that the discussion doesn't really affect how or when Parquet MR moves.

For the issue you brought up with PARQUET-369, I'm going to go ahead and respond on that issue so it's easy to follow later when I completely forget everything. :)

asfimport commented 8 years ago

Liwei Lin(Inactive) / @lw-lin: hi @julienledem, @rdblue, and [~liancheng]: Now that Parquet-305 has been merged, maybe we should consider replacing all Log.java usages with slf4j? Should anyone hasn't started it yet, I'd like to do this.

Will remove the if (Log.DEBUG) condition, and place the original LOG.debug("msg is" + msg) with the slfj4 parameterized form LOG.debug("msg is {}", msg), leaving it for slf4j to judge if the certain log level is enabled or not.

asfimport commented 8 years ago

Liwei Lin(Inactive) / @lw-lin: Looked through the code base and have found hundreds of Log.xxx() usages within 90+ classes. Should take 3 days to replace all of them. Do we want to get this in 1.9.0? I think it'd better not delay the release.

asfimport commented 8 years ago

Cheng Lian / @liancheng: Fix of this issue is nice to have but probably shouldn't block 1.9.0.

asfimport commented 8 years ago

Paul Praet: Is this planned to be fixed in the next release ?

asfimport commented 8 years ago

Ryan Blue / @rdblue: This may not be, but logging will happen though SLF4J. We made the Log class use SLF4J instead of java.util.Logging already, this is just needed to clean up internally.

asfimport commented 7 years ago

Steve Scaffidi: Just in case it helps anybody... when using Hive with Parquet in CDH 5.x, I found a work-around for unwanted log output from Parquet in the container logs: https://issues.apache.org/jira/browse/SPARK-4412?focusedCommentId=16118403