Closed tsailiming closed 9 years ago
This is due to an interaction between a PySpark bug and NumPy 1.9; see https://github.com/thunder-project/thunder/issues/41 for another report.
It looks like this was fixed in Spark 1.2.0, but not in other branches: https://issues.apache.org/jira/browse/SPARK-3995
Since I think we initialize the seed ourselves, can we fix this in spark-perf
by adding a modulus where we set the seed?
I'm using NumPy 1.9.1 installed from PIP.
I'm using Spark 1.2.0 too.
Another error after downgrading to numpy 1.8.2
15/01/26 11:26:59 INFO scheduler.TaskSetManager: Lost task 200.3 in stage 0.0 (TID 261) on executor numaq1-4: org.apache.spark.api.python.PythonException (Tra
ceback (most recent call last):
File "/opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/worker.py", line 107, in main
process()
File "/opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/worker.py", line 98, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/serializers.py", line 227, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/net/home/ltsai/a/spark-perf.new/pyspark-tests/mllib_data.py", line 21, in gen
rng = numpy.random.RandomState(hash(str(seed ^ index)))
File "mtrand.pyx", line 574, in mtrand.RandomState.__init__ (numpy/random/mtrand/mtrand.c:5495)
File "mtrand.pyx", line 606, in mtrand.RandomState.seed (numpy/random/mtrand/mtrand.c:5712)
OverflowError: can't convert negative value to unsigned long
0xffffffff
mask.
All tests are failing because of the random seed.
Number of failed tests: 8, failed tests: python-glm-classification,python-glm-classification,python-glm-regression,python-naive-bayes,python-als,python-kmeans,python-pearson,python-spearman