Open Highbrainer opened 7 years ago
Just a precision : I see that the current code (in the master branch) supports the new "query" parameter as an alternative to "dbtable". This answers the first part of my question - the remaining is "How can I define/customize a partioning strategy?".
Also : In which maven repo can I find a version that is more recent than 0.1.1 ?
Thank you for trying out. As you have noticed query support is not released yet. You can find snap shot jar at the following location: https://oss.sonatype.org/content/repositories/snapshots/com/ibm/SparkTC/ https://oss.sonatype.org/content/repositories/snapshots/com/ibm/SparkTC/spark-netezza_2.11/1.0.0-SNAPSHOT/spark-netezza_2.11-1.0.0-SNAPSHOT.jar
You can specify the integer column to specify the partions strategy. eg: val opts = defaultOpts + ("query" -> s"select * from $tabName") + ("partitioncol" -> "ID") + ("numPartitions" -> Integer.toString(4)) + ("lowerbound" -> "1") + ("upperbound" -> "100")
val testDf = sqlContext.read.format("com.ibm.spark.netezza").options(opts).load()
I should also probably throw a better message for views.
Hi, I have tested version 0.1.1 from maven central.
I discovered that this datasource does not support views because netezza itself does not support the concept of dataslice for views.
java.lang.RuntimeException: Error creating external table pipe:org.netezza.error.NzSQLException: ERROR: Column reference "DATASLICEID" not supported for views
Maybe we could handle views differently and allow only one worker for views : in my experience, it is faster to get a whole view at once with an external table than trying to retreive it by "classical" means...
And in the future we could even imagine that the user could provide a "dispatch strategy" to let define a custom query builder on a view-per-view basis, and thus make it possible for parallel retreival of views...
What do you think?