Closed nmldiegues closed 5 years ago
The original problem was:
TaskSetManager.logWarning:66 Lost task 0.0 in stage 0.0 (TID 0, ip-10-52-126-136.eu-west-2.compute.internal, executor 1): java.lang.NoSuchFieldError: INSTANCE at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:146) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:86) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:63) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:56) at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50) at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38) at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:315) at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:299) at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:169) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:579) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:559) at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:537) at org.apache.hadoop.fs.s3a.S3ClientFactory$DefaultS3ClientFactory.createAmazonS3Client(S3ClientFactory.java:202) at org.apache.hadoop.fs.s3a.S3ClientFactory$DefaultS3ClientFactory.createS3Client(S3ClientFactory.java:78) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:186) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2859) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2896) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2878) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:392) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at com.feedzai.pulse.datascience.datasource.csv.CsvDataSourceStringSplitReader.getDataSplitData(CsvDataSourceStringSplitReader.java:226) at com.feedzai.pulse.datascience.datasource.csv.CsvDataSourceSplitReader.getDataSplitData(CsvDataSourceSplitReader.java:79) at com.feedzai.distributed.job.backend.spark.rdd.DataReaderRDD.compute(DataReaderRDD.java:69) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Validated manually that the class AllowAllHostnameVerifier.class
appeared in the openml-h2o jar and no longer appears after this change.
Hey @nmldiegues,
Your changes look good to me!
Merging #5 into master will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #5 +/- ##
=========================================
Coverage 75.43% 75.43%
Complexity 220 220
=========================================
Files 22 22
Lines 753 753
Branches 70 70
=========================================
Hits 568 568
Misses 147 147
Partials 38 38
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 2776e64...cd366e7. Read the comment docs.
This passed on our automated tests on Spark embedded and Spark on Cloudera or Standalone. Therefore proceeding to merge, back-port and release 0.5 hotfix.
The "httpclient" dependency is very popular and used across many projects. In this case, H2O depends on it and it conflicted with AWS dependencies when we use H2O OpenML in Spark jobs on AWS EMR.
This way we make it provided so that it can be decided by users.