astronomy-commons / axs

Astronomy eXtensions for Spark: Fast, Scalable, Analytics of Billion+ row catalogs
https://axs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
23 stars 12 forks source link

py4j.Py4JException: Method saveNewTable(...) does not exist #10

Closed acscott closed 5 years ago

acscott commented 5 years ago

Hi, trying to call save_axs_table I get the error in this subject. Below are the details that might help resolve this. Is there something obvious I'm overlooking or is this a bug?

[datalab@gp05 ~/axs]$ java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
e[datalab@gp05 ~/axs]$ echo $JAVA_HOME
/usr/lib/jvm/jre-openjdk
[datalab@gp05 ~/axs]$ echo $SPARK_HOME
/home/datalab/axs
[datalab@gp05 ~/axs]$ echo $PATH
/usr/lib/jvm/jre-openjdk:/home/datalab/axs/bin:/usr/lib64/qt-3.3/bin:/data0/sw/anaconda2/bin:/data0/sw/anaconda3/bin:/usr/local/bin:/bin:/usr/bin
[datalab@gp05 ~/axs]$ pyspark
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
19/06/26 14:16:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/06/26 14:16:06 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
      /_/

Using Python version 3.6.8 (default, Dec 30 2018 01:22:34)
SparkSession available as 'spark'.
>>> from axs import AxsCatalog, Constants
>>> db = AxsCatalog(spark)
>>> spark.catalog.currentDatabase()
'default'
>>> dat = spark.read.csv(header=True,path='/gaia_source/csv/GaiaSource_1703858022185355904_1704227084430340864.csv')
>>> db.save_axs_table(dat, 'test4', repartition=True, calculate_zone=True, num_buckets=Constants.NUM_BUCKETS, zone_height=500)
19/06/26 14:17:11 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
19/06/26 14:17:18 WARN HiveExternalCatalog: Persisting bucketed data source table `default`.`test4` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/datalab/axs/python/axs/catalog.py", line 181, in save_axs_table
    False, None)
  File "/home/datalab/axs/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/home/datalab/axs/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/datalab/axs/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o38.saveNewTable. Trace:
py4j.Py4JException: Method saveNewTable([class java.lang.String, class java.lang.Integer, class java.lang.Integer, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.Boolean, null]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
zecevicp commented 5 years ago

I was able to reproduce this bug. The problem is in the zone_height parameter. Please try specifying it as a float, not an int.

acscott commented 5 years ago

Thank you for the quick response. Using 500.0 for zone_height worked!

zecevicp commented 5 years ago

Thank you for the quick response. Using 500.0 for zone_height worked!

This seems to be a mistake, however. The default zone height is one arc-minute, or 0.0167 degrees.