Open mohika-knoldus opened 1 year ago
@mohika-knoldus did you resolve this? I'm having the same issue.
No @oliverangelil .
@mohika-knoldus
The solution was to install apache Hadoop. If you add some config to your spark context it will download it automatically:
spark = (SparkSession
.builder
.config('spark.jars.packages', 'org.apache.hadoop:hadoop-azure:3.3.1,io.delta:delta-core_2.12:2.2.0,io.delta:delta-sharing-spark_2.12:0.6.2')
.config('spark.sql.extensions', 'io.delta.sql.DeltaSparkSessionExtension')
.config('spark.sql.catalog.spark_catalog', 'org.apache.spark.sql.delta.catalog.DeltaCatalog')
.getOrCreate()
)
Or you can download it from the website.
Then you can read the table in like this:
delta_sharing.load_as_spark(table_url).show()
or like this:
spark.read.format("deltasharing").load(table_url).limit(100)
You can alternatively read the table in without Hadoop, if you use delta_sharing.load_as_pandas(table_url, limit=10)
so either there is a dependency on python library or apache hadoop at the end ?
Thank you for the solution. @oliverangelil
import io.delta.sharing.client import org.apache.spark.sql.SparkSession
object ReadSharedData extends App {
val spark = SparkSession.builder() .master("local[1]") .appName("Read Shared Data") .getOrCreate()
val profilePath = "/home/knoldus/Desktop/Delta Open Sharing/resources/config.share" val sharedFiles = client.DeltaSharingRestClient(profilePath).listAllTables() sharedFiles.foreach(println) /// this works fine and lists all the tables in the share provided by data provider.
val popular_products_df = spark.read.format("deltaSharing").load("/home/knoldus/Desktop/Delta Open Sharing/resources/config.share#checkout_data_products.data_products.popular_products_data") popular_products_df.show()