Open SungMinHong opened 7 years ago
I solved that problem. I just forget python script file name in the command. : (
"spark-submit --packages databricks:tensorframes:0.2.9-s_2.11 --master yarn --deploy-mode cluster <python_script_name>"
And I also get a new problem that workers(data node) can't import tensorflow maybe because of my all cluster's node use Virtualenv to install tensorflow.
Someone know solution about that problem? If you know the answer, let me know. : )
Spark YARN cluster is not serving Virtulenv mode until now.
So I reinstalled tensorflow using pip. And I testing tensorframe in my single local node like this.
$ spark-submit --packages databricks:tensorframes:0.2.9-s_2.11 --master local --deploy-mode client test_tfs.py > output
test_tfs.py
import tensorflow as tf
import tensorframes as tfs
from pyspark.sql import Row
from pyspark.sql.functions import *
from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext("local", "tfs single node mode test")
sc.setLogLevel("ERROR")
sqlContext = SQLContext(sc)
#tensorframe's example
data = [Row(x=float(x)) for x in range(5)]
df = sqlContext.createDataFrame(data)
with tf.Graph().as_default() as g:
# The placeholder that corresponds to column 'x'
x = tf.placeholder(tf.double, shape=[None], name="x")
# The output that adds 3 to x
z = tf.add(x, 3, name='z')
# The resulting dataframe
df2 = tfs.map_blocks(z, df)
df2.show()
But, I meet problem about AttributeError
Traceback (most recent call last):
File "/home/hong/test_tfs.py", line 19, in <module>
df2 = tfs.map_blocks(z, df)
File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 312, in map_blocks
File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 146, in _map
File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 101, in _get_graph
File "/home/hong/.ivy2/jars/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 43, in _initialize_variables
AttributeError: 'module' object has no attribute 'global_variables'
If someone have this error like me, let me know plz.
=> It's caused by tensorflow's old version. sol: upadate tensorflow
Hi, Everyone. Tensorframe is interesting to me. So I want to test Tensorframe in my Spark's cluster. But I have wonder.
- I wondering that Tensorframe just needs to install to master node and don't needs to install any worker.
- This command is okay that "pyspark --packages databricks:tensorframes:0.2.9-s_2.11" But I want to use Spark's cluster. So, I want to use this command "spark-submit --packages databricks:tensorframes:0.2.9-s_2.11". But this command has the error like this:
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource. at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:274) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) at org.apache.spark.launcher.Main.main(Main.java:86)
Someone have this error?
- I have Spark's cluster that has 3 nodes (YARN on a Hadoop cluster, 1 master, 2 worker)
- I used Virtualenv for Tensorflow(CPU version)
- I install panda-0.20.3
I would be grateful if you could please answer my question.
@SungMinHong Same question!
Hi, Everyone. Tensorframe is interesting to me. So I want to test Tensorframe in my Spark's cluster. But I have wonder.
Someone have this error?
I would be grateful if you could please answer my question.