derrickoswald / CIMSpark

Spark access to Common Information Model (CIM) files
MIT License
15 stars 1 forks source link

non-default database #18

Open derrickoswald opened 5 years ago

derrickoswald commented 5 years ago

The CIM RDD are currently added to the default SparkSQL database:

scala> spark.sql ("show databases").show
+------------+
|databaseName|
+------------+
|     default|
+------------+

It would be good if the database could be specified.

mbheinen commented 4 years ago

Are you thinking the desired database would be a run time argument to the existing docker-compose scripts for sandbox and beach?

derrickoswald commented 4 years ago

It isn't an "instance" level thing, more of a "read" level option IMHO.

The spot to inject it is probably in CIMRelation.

So the developer visible entry point would be something like:

import scala.collection.mutable.HashMap
import org.apache.spark.rdd.RDD
import ch.ninecode.cim._
import ch.ninecode.model._

val opts = new HashMap[String,String]()
opts.put ("ch.ninecode.cim.spark_database", "my_database")
val element = spark.read.format ("ch.ninecode.cim").options (opts).load ("hdfs://sandbox:8020/data/CGMES_v2.4.15_RealGridTestConfiguration_EQ_v2.xml")

which would use my_database instead of default.

The database view is created in CIMSubsetter, but it's unclear how one specifies the database to use other than the "default".