FS1360472174 / spark-knowledge

notes for learning spark
0 stars 0 forks source link

spark 名词 #7

Open FS1360472174 opened 7 years ago

FS1360472174 commented 7 years ago

http://spark.apache.org/docs/latest/rdd-programming-guide.html http://blog.csdn.net/bluishglc/article/details/50715879

FS1360472174 commented 7 years ago
  1. 闭包(closures)的理解
    
    int counter = 0;
    JavaRDD<Integer> rdd = sc.parallelize(data);

// Wrong: Don't do this!! rdd.foreach(x -> counter += x);

println("Counter value: " + counter);


闭包和它的上下文只在一个进程(节点)中有效。
在集群模式下,无法如预想的那样进行变量求值。
scope不是全局共享,而是有上下文的
FS1360472174 commented 7 years ago

broadcast 就是将数据从一个节点发送到其他各个节点上去。broadcast 是只读变量

简单理解就是共享的只读变量

FS1360472174 commented 7 years ago

http://www.infoq.com/cn/articles/scala-for-java-devs