apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Docker logging handler #576

Open echarles opened 6 years ago

echarles commented 6 years ago

What changes were proposed in this pull request?

To better view the progress of the integration test when the docker image are build, this custom handler prints progress.

How was this patch tested?

Run integration test and check you see output like this:

...
* DONE (SparkR)
Building docker image spark-base from Dockerfile dockerfiles/spark-base/Dockerfile
Docker build: Step 1 : FROM openjdk:8-alpine
...
foxish commented 6 years ago

I think the println is adding a scalastyle warning. Change LGTM otherwise - but please note that we are moving the docker image building step to a separate step which may not necessarily be in scala code in the near future.

echarles commented 6 years ago

Thx for the review @foxish. Any link to the Docker image building change?

foxish commented 6 years ago

That change isn't in yet - but we did have a discussion about it in today's meeting. It'll likely be started by the work in https://github.com/apache-spark-on-k8s/spark/pull/521

echarles commented 6 years ago

Reading the meeting notes, I realize I had to leave just before you have talked about the PR I have submitted. too bad... I will follow-up also the point about the docker images.

For this PR, if the other way to build docker images is implemented quickly, no need to merge, otherwise I think this can help to have a better view (or at least to be less worried as a manual tester) about the long silent pause while running the integration tests.

kimoonkim commented 6 years ago

@echarles wrote:

I think this can help to have a better view (or at least to be less worried as a manual tester) about the long silent pause while running the integration tests.

Thanks for looking into this. I also observed the same issue, which tests developer patience :-)

We recently fixed this in the integration-test repo in a different way. The stock LoggingBuildHandler used by the SparkDockerImageBuilder uses slf4j. And our integration-test uses log4j. So the solution was simply to include a slf4j-to-log4j bridge jar as maven dependency:

   <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-log4j12</artifactId>
      <version>${slf4j-log4j12.version}</version>
      <scope>test</scope>
    </dependency>

Can you try this approach in this PR? If it works, then we would not need the custom logging handler nor println.

echarles commented 6 years ago

Hello @kimoonkim, thx for the review and suggestion. I have pushed an update which now uses slf4j-log4j without custom handler. In my previous tests, I remember I tried that and came with verbose output like:

2017-12-24 11:21:45 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=2fdfe1cd78c2, status=Extracting, stream=null, error=null, progress=[==================================================>] 2.065 MB/2.065 MB, progressDetail=ProgressDetail{current=2064911, start=0, total=2064911}}
2017-12-24 11:21:45 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=2fdfe1cd78c2, status=Pull complete, stream=null, error=null, progress=null, progressDetail=ProgressDetail{current=0, start=0, total=0}}
2017-12-24 11:21:45 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=82630fd6e5ba, status=Extracting, stream=null, error=null, progress=[==================================================>]    239 B/239 B, progressDetail=ProgressDetail{current=239, start=0, total=239}}
2017-12-24 11:21:45 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=001511eb3437, status=Downloading, stream=null, error=null, progress=[>                                                  ] 1.065 MB/70.23 MB, progressDetail=ProgressDetail{current=1065296, start=0, total=70227876}}
2017-12-24 11:21:45 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=82630fd6e5ba, status=Extracting, stream=null, error=null, progress=[==================================================>]    239 B/239 B, progressDetail=ProgressDetail{current=239, start=0, total=239}}
2017-12-24 11:21:45 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=001511eb3437, status=Downloading, stream=null, error=null, progress=[=>                                                 ] 1.606 MB/70.23 MB, progressDetail=ProgressDetail{current=1605968, start=0, total=70227876}}
2017-12-24 11:21:45 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=82630fd6e5ba, status=Pull complete, stream=null, error=null, progress=null, progressDetail=ProgressDetail{current=0, start=0, total=0}}
2017-12-24 11:21:46 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=001511eb3437, status=Downloading, stream=null, error=null, progress=[=>                                                 ] 2.147 MB/70.23 MB, progressDetail=ProgressDetail{current=2146640, start=0, total=70227876}}
2017-12-24 11:21:47 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=001511eb3437, status=Downloading, stream=null, error=null, progress=[=>                                                 ] 2.687 MB/70.23 MB, progressDetail=ProgressDetail{current=2687312, start=0, total=70227876}}
2017-12-24 11:21:48 INFO  LoggingBuildHandler:36 - build: ProgressMessage{id=001511eb3437, status=Downloading, stream=null, error=null, progress=[==>                                                ] 3.227 MB/70.23 MB, progressDetail=ProgressDetail{current=3226992, start=0, total=70227876}}
..

Seeing nothing is not fine, seeing too much is not better...

An combined solution would be to use log4j (not println) with the custom handler that prints only what we want.

kimoonkim commented 6 years ago

Seeing nothing is not fine, seeing too much is not better...

I would say this is actually better because the long silence is much worse. These messages at least help us debug which image has failed to build.

An combined solution would be to use log4j (not println) with the custom handler that prints only what we want.

That sounds good to me, but @foxish were saying we want to retire this scala docker image builder code.

we are moving the docker image building step to a separate step which may not necessarily be in scala code in the near future.

So maybe we should first agree on what we want to do next, and this PR is good enough as is?

kimoonkim commented 6 years ago

Hello @kimoonkim, thx for the review and suggestion. I have pushed an update which now uses slf4j-log4j without custom handler.

@echarles Thanks for the change!

echarles commented 6 years ago

@kimoonkim I just made a new push reintroducing the custom LoggingBuildHandler to have a medium level of messages on the console.

echarles commented 6 years ago

The last push takes into account all @kimoonkim comments (not logging anymore to console but to file + added a small scaladoc to make this information available to newcomers).