Closed Asmaa-Ali closed 7 years ago
@jmabuin Please, I need help to solve this problem.
Hi @Asmaa-Ali
SparkBWA creates temporary .sam files in the computing nodes (executors) while the program is executing. Can you connect to one of your nodes and check if this temporary files have some content?
This files are stored in the spark temporary directory or in /tmp/, you can find them by using the find
command
find /tmp/ -name "*.sam"
Also, what is the content of your /Data/HumanBase/ dir? is this content available in all nodes?
I run into the same issue, and my temporary .sam files inside /tmp also have no content.
Can you please check if it works with SparkBWA 0.2 ?
Hi, I have tried it with SparkBWA 0.2 and I run also into the same issue. It works fine and then the produced sam files are empty ! The content of /Data/HumanBase/ dir is ivailable for all my worker nodes. Any suggestions, please ?
Do you have permissions to write in the tmp folder? also, which versions of Hadoop and Spark are you using?
Yes I do have thr ights to write in /tmp. I am using spark2.0 and hdfs 2.7.3 . Any ideas welcome !
Have you tried to use yarn-cluster
instead of yarn-client
?
Actually in newer versions of Spark it should be --master yarn --deploy-mode cluster
Same issue here. I try also to run with the local scheduler (no yarn). Could that be the reason for missing/empty sam files? Does all input data need to be through hdfs? It also complains it can't find the index file and have set up proper permission to all locations. Thank you.
Update: Obviously, one really needs a running hadoop cluster so that the code can work on the data in HDFS. Hence the empty sam files in Spark standalone cluster mode. It would be nice if there were an option for running a spark standalone instance. Hadoop can be a real pain under Torque/PBS job schedulers.
I fix. Now SparkBWA-0.2 can run yarn or standalone and output sam file in my local cluster.
@xubo245 Thanks a lot - the fix works. I can confirm that it also runs in Spark standalone mode (no Hadoop FS)
You are welcome.
@xubo245 It is still not working for me in standalone mode.Have you done any more specific changes
Yes, I have temporary changes for standalone, but it is not best solutions...
com.github.sparkbwa.BwaAlignmentBase#copyResults
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://Master:9000/");
FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000/"),conf);
The Master should be your cluster hostname.
@xubo245 We are running on non-pdfs environment using GPFS. How can we make it general fs.The fs is available on every nodes similar to hdfs
You should replace HDFS code with GPFS API, but I do not known GPFS...
SparkBWA has many many HDFS code...