databricks / koalas

Koalas: pandas API on Apache Spark
Apache License 2.0
3.32k stars 356 forks source link

AttributeError: type object 'InternalFrame' has no attribute 'restore_index' #2156

Closed RainFung closed 3 years ago

RainFung commented 3 years ago
 UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.fallback.enabled' does not have an effect on failures in the middle of computation.
  An error occurred while calling o59044.getResult.
: org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
    at org.apache.spark.api.python.PythonServer.getResult(PythonRDD.scala:874)
    at org.apache.spark.api.python.PythonServer.getResult(PythonRDD.scala:870)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 19.0 failed 4 times, most recent failure: Lost task 2.3 in stage 19.0 (TID 14390, 11.0.109.187, executor 149): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 377, in main
    process()
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 372, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/serializers.py", line 290, in dump_stream
    for series in iterator:
  File "<string>", line 1, in <lambda>
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 101, in <lambda>
    return lambda *a: (verify_result_length(*a), arrow_return_type)
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 92, in verify_result_length
    result = f(*a)
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/util.py", line 99, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/databricks/koalas/accessors.py", line 919, in <lambda>
  File "/usr/local/lib/python3.6/site-packages/databricks/koalas/groupby.py", line 1375, in rename_output
AttributeError: type object 'InternalFrame' has no attribute 'restore_index'
HyukjinKwon commented 3 years ago

@RainFung would you mind sharing the codes you run and the Koalas version?

Sbargaoui commented 3 years ago

I found the origin of the issue.

When bumping koalas to 1.8.0, one of the worker nodes was still using koalas 1.5.0, which didn't introduce restore_index as method in InternalFrame (https://github.com/databricks/koalas/blob/master/databricks/koalas/internal.py) until version 1.8.0.

Making sure that every node uses the same version should fix it.