Open amznero opened 2 years ago
Hi,
I'm trying to use Koalas to load a hive table on the remote cluster. In https://koalas.readthedocs.io/en/latest/reference/io.html#spark-metastore-table, it says that I can use ks.read_table API to read spark-table, but it failed when I use ks.read_table to read the table.
ks.read_table
import pandas as pd import numpy as np import databricks.koalas as ks from pyspark.sql import SparkSession koalas_df = ks.read_table("xxx.yyy")
Error log:
AnalysisException: "Table or view not found: `xxx`.`yyy`;;\n'UnresolvedRelation `xxx`.`yyy`\n"
However, I can load it successfully by directly using pyspark+pandas+pyarrow.
some snippets
from pyspark.sql import SparkSession spark = SparkSession.builder.enableHiveSupport().getOrCreate() spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true") spark_df = spark.read.table("xxx") pandas_df = spark_df.toPandas() ...
And I check some source codes in https://github.com/databricks/koalas/blob/e971d6f37ede45297bbf9d509ae2a7b51717f322/databricks/koalas/namespace.py#L556
It uses default_session(without option configures) to load the table, but it does not set enableHiveSupport option.
enableHiveSupport
https://github.com/databricks/koalas/blob/e971d6f37ede45297bbf9d509ae2a7b51717f322/databricks/koalas/utils.py#L433-L456
So, I'm a little confused about ks.read_table, where does it load tables from? Maybe link to Spark-warehouse?
Hi,
I'm trying to use Koalas to load a hive table on the remote cluster. In https://koalas.readthedocs.io/en/latest/reference/io.html#spark-metastore-table, it says that I can use
ks.read_table
API to read spark-table, but it failed when I useks.read_table
to read the table.Error log:
However, I can load it successfully by directly using pyspark+pandas+pyarrow.
some snippets
And I check some source codes in https://github.com/databricks/koalas/blob/e971d6f37ede45297bbf9d509ae2a7b51717f322/databricks/koalas/namespace.py#L556
It uses default_session(without option configures) to load the table, but it does not set
enableHiveSupport
option.https://github.com/databricks/koalas/blob/e971d6f37ede45297bbf9d509ae2a7b51717f322/databricks/koalas/utils.py#L433-L456
So, I'm a little confused about
ks.read_table
, where does it load tables from? Maybe link to Spark-warehouse?