aws-samples / emr-bootstrap-actions

This repository hold the Amazon Elastic MapReduce sample bootstrap actions
Other
615 stars 304 forks source link

Cannot use native BLAS #124

Closed codesuki closed 1 year ago

codesuki commented 9 years ago

When running Spark I get the following warnings:

15/06/15 11:17:36 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
15/06/15 11:17:36 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
15/06/15 11:17:36 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
15/06/15 11:17:36 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK

I saw that the workers don't have libgfortran, that might be the cause.

My sbt build file includes

libraryDependencies += "com.github.fommil.netlib" % "all" % "1.1.2"

Anyone else getting this warning?

RoiViber commented 9 years ago

Hi, have you figured this out already?

codesuki commented 9 years ago

Oh great you reminded me, I wanted to write a new comment here :)

TLDR; Step 1: Put dependencies including native netlib in a separate JAR and use the parameter of the AWS Spark setup script -u <s3://bucket/path_to_find_jars/> Add the jars in the given S3 path to sparkclasspathin the user-provided directory (ahead of all other dependencies) The important point being ahead of all other dependencies. Alternatively you could rebuild spark with native netlib support and replace the spark.jar during bootstrap. Note that it says Add the jars in the given S3 path, but actually it just adds all the files so take care what you put in there.

Step 2: Put the following script somewhere on S3

printf "\n# NATIVE BLAS\nexport LD_LIBRARY_PATH=/usr/lib64/atlas:$LD_LIBRARY_PATH\n" >> /home/hadoop/spark/conf/spark-env.sh
printf "\n# NATIVE BLAS\nexport LD_LIBRARY_PATH=/usr/lib64/atlas:$LD_LIBRARY_PATH\n" >> /home/hadoop/.bashrc

Add this to --bootstrap-actions Path="s3://host/path/to/native-blas-bootstrap-action.sh",Name="Add ATLAS / BLAS Library Path"

Now it should work.

Long story:

So I tried 'all the settings' and what I ended up with was that no setting worked. Maybe I misunderstand them but my guess is that Spark doesn't propagate some settings correctly, i.e. LD_LIBRARY_PATH not getting set despite the manual saying so.

Anyway the first problem was as written above that netlib was not found on the classpath even though it was included in the Assembly JAR. To fix that I set the classpath to include the assembly JAR. It has to do something with the Classloader JAR ordering, netlib is already in the Spark.jar and that gets loaded before your application JAR. Anyway I solved it by making a JAR only including the dependencies of the project and uploading that to s3, then using the parameter of the Spark setup script to download all files in the bucket to the user-dependencies folder that is on the classpath.

--bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark,Name="Install Spark",Args=[-v,1.4.0.b,-u,s3://host/path/to/deps]

The next error that will come up is that even though all the needed libraries are in /usr/lib64/atlas and there is an ld.conf for that Spark still can't find the libraries and will crash trying to load them.

To fix that I made this script which is executed during bootstrap:

printf "\n# NATIVE BLAS\nexport LD_LIBRARY_PATH=/usr/lib64/atlas:$LD_LIBRARY_PATH\n" >> /home/hadoop/spark/conf/spark-env.sh
printf "\n# NATIVE BLAS\nexport LD_LIBRARY_PATH=/usr/lib64/atlas:$LD_LIBRARY_PATH\n" >> /home/hadoop/.bashrc

Add this to --bootstrap-actions Path="s3://host/path/to/native-blas-bootstrap-action.sh",Name="Add ATLAS / BLAS Library Path"

There is a chance I got confused during testing all the different Spark settings but it only worked when I wrote it in both, .bashrc (which should only matter if you login on that server..) AND spark-env.sh. Writing in only one of them did not help.

The Spark settings: spark.driver.extraLibraryPath, spark.executor.extraLibraryPath, spark.yarn.am.extraLibraryPath, had no effect.

RoiViber commented 9 years ago

@codesuki I don't understand something - what do you mean by ahead of all other dependencies?

codesuki commented 9 years ago

Let's say you have 2 .jar files that each consist of different libraries.. if they both have a library in common then it seems, as far as my understanding goes, that the Classloader will load the one that comes first on the classpath and ignore the other ones.

RoiViber commented 9 years ago

OK. Final question, if you may. I managed to get rid of the "WARN BLAS: Failed to load implementation from..." warning by following your steps. Now spark runs a bit slower than without using ATLAS (MLLib's ALS). Is it even possible that using ATLAS is slower than the default BLAS?

codesuki commented 9 years ago

I am not an expert but I seriously doubt that. You would have to check which implementation exactly was loaded but ATLAS should be faster. Did ANYTHING else change? You can find some (maybe biased) benchmarks here: https://github.com/fommil/netlib-java

Just a shot in the dark but performance might also depend on how ATLAS is installed on the nodes. Is it optimized for the specs or just some generic implementation. Sorry I can't help with that.

avulanov commented 8 years ago

Just came across the same issue. One can set spark.driver.userClassPathFirst=true in Spark conf and supply Spark with all netlib-java dependencies.

codesuki commented 8 years ago

Yes. In my case I couldn't use it because of: This is used in cluster mode only. . And for some reason I couldn't use cluster mode. (It failed complaining some hadoop jar was not on hdfs. After investigating it was actually copied to the correct location and then instantly deleted again.)

zachliu commented 7 years ago

Same issue while fitting a logistic regression model on EMR 5.8.0 with pyspark.

17/09/01 14:29:01 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/09/01 14:29:01 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS

@avulanov could you please elaborate on how to obtain those netlib-java dependencies? Thanks!

elgalu commented 6 years ago

How did you guys solve this WARN in the end?

codesuki commented 6 years ago

It should be solved once you follow the steps in my answer above. I am not using aws anymore. Since your are asking it seems they didn't fix it yet? After switching to Google cloud everything worked like a charm, without extra settings. Even the spark cluster starts up much faster.

christopherbozeman commented 6 years ago

AWS EMR emr-5.14.0 and later had native blas support included with Spark.

khan008 commented 4 years ago

@codesuki , @elgalu, @zachliu , @avulanov ,@RoiViber Please, Please, Please, Help me to get out of this Problem. i have been tried from long ago but i could not figure it out.
20/02/22 18:53:02 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 20/02/22 18:53:02 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS

dacort commented 1 year ago

Hi there - thanks for your contribution. We're updating this repository to include more relevant and recent information.

As such, we're cleaning up and closing old issues and PRs.

Feel free to open an issue if you still use EMR and would like to see an example of something!