astronomy-commons / axs

Astronomy eXtensions for Spark: Fast, Scalable, Analytics of Billion+ row catalogs
https://axs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
23 stars 12 forks source link

catalog.drop_table() does not appear to remove table files? #9

Open ebellm opened 5 years ago

ebellm commented 5 years ago

I'm trying to replace an existing AXS table with a new version.

I ran

catalog.drop_table('green19_stellar_params')
catalog.save_axs_table( sdf, 'green19_stellar_params', repartition=True, calculate_zone=True)

and got the following error:

AnalysisException: "Can not create the managed table('`green19_stellar_params`'). The associated location('file:/epyc/projects/lsd2/pzwarehouse/green19_stellar_params') already exists.;"

And indeed the files are there.

running drop_table again reports: 'Table or view not found: green19_stellar_params;'

(I manually removed the directory and continued.)

ctslater commented 3 years ago

Another related case, apparently sometimes after calling drop_table() and manually deleting the data directory, spark still throws an error when trying to save a new table with that name.

Clipping some parts from the traceback:

Py4JJavaError: An error occurred while calling o155.saveAsTable.
: org.apache.spark.sql.AnalysisException: Table `skymapper` already exists.;
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:418)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:403)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

...

/epyc/opt/spark-axs/python/axs/catalog.py in save_axs_table(self, df, tblname, repartition, calculate_zone, num_buckets, zone_height, path)
    192             writer.option("path", path)
    193 
--> 194         writer.saveAsTable(tblname)