ActianCorp / spark-vector

Repository for the Spark-Vector connector
Apache License 2.0
20 stars 9 forks source link

QA - SPARK CONNECTOR - parallel unload: wrong values unloaded for ansidate datatype #27

Closed Pyrobal closed 8 years ago

Pyrobal commented 8 years ago

Seems like older dates are unloaded wrongly while new ones work.

This is 4.2.3 with path and manually provided jar from Cristi.

$ hadoop fs -cat /Actian/VectorK2/tab_ANSIDATE.csv/part* 1,0001-01-03 2,1000-10-04 3,null 4,2000-02-29 5,9999-12-31 ..

+-------------+----------+ |col1 |col2 | +-------------+----------+ | 1|0001-01-01| | 2|1000-10-10| | 3| | | 4|2000-02-29| | 5|9999-12-31| +-------------+----------+ (15 rows)

TESTCASE:

take ansidate.txt attached to the issue:

CREATE TABLE tab_ANSIDATE (col1 int, col2 ANSIDATE);\g COPY TABLE tab_ANSIDATE (col1 = c0tab, col2 = c0nl WITH NULL('NULL')) FROM 'ansidate.txt'\g

export SEPPARAMDB=testdb export SPARK_MASTER=${SPARK_MASTER:-yarn} export TMP_II_INSTALLATION=${TMP_II_INSTALLATION:-$(iigetenv II_INSTALLATION)} export TMP_HOSTNAME=${TMP_HOSTNAME:-$HOSTNAME} export HDFS_TMP=${HDFS_TMP:-$(iigetenv II_HDFSDATA)} export SPARK_LOADER_JAR=/home/actian/kelch01/Spark/unloader/spark_vector_loader-assembly-0.1.jar

val savepath =sys.env("HDFS_TMP") val installation_ID= sys.env("TMP_II_INSTALLATION") val hostname= sys.env("TMP_HOSTNAME") val databasename= sys.env("SEPPARAMDB")

sqlContext.sql(s"""CREATE TEMPORARY TABLE tab_ANSIDATE USING com.actian.spark_vector.sql.DefaultSource OPTIONS ( host "$hostname", instance "$installation_ID", database "$databasename", table "tab_ANSIDATE", user "actian", password "actian")""")

sqlContext.sql("select * from tab_ANSIDATE").write.format("com.databricks.spark.csv").save(s"$savepath/tab_ANSIDATE.csv")

cbarca commented 8 years ago

there is a slight inconsistency between how Vector considers the beginning of the date (earliest date) and how Spark/Java considers it

The earliest valid date in Spark/Java is 1700-01-01T00:00:00Z GMT, or just after midnight on January 1, 1700. The latest valid date is 4000-12-31T00:00:00Z GMT, or just after midnight on December 31, 4000.