Closed Pyrobal closed 8 years ago
there is a slight inconsistency between how Vector considers the beginning of the date (earliest date) and how Spark/Java considers it
The earliest valid date in Spark/Java is 1700-01-01T00:00:00Z GMT, or just after midnight on January 1, 1700. The latest valid date is 4000-12-31T00:00:00Z GMT, or just after midnight on December 31, 4000.
Seems like older dates are unloaded wrongly while new ones work.
This is 4.2.3 with path and manually provided jar from Cristi.
$ hadoop fs -cat /Actian/VectorK2/tab_ANSIDATE.csv/part* 1,0001-01-03 2,1000-10-04 3,null 4,2000-02-29 5,9999-12-31 ..
+-------------+----------+ |col1 |col2 | +-------------+----------+ | 1|0001-01-01| | 2|1000-10-10| | 3| | | 4|2000-02-29| | 5|9999-12-31| +-------------+----------+ (15 rows)
TESTCASE:
take ansidate.txt attached to the issue:
CREATE TABLE tab_ANSIDATE (col1 int, col2 ANSIDATE);\g COPY TABLE tab_ANSIDATE (col1 = c0tab, col2 = c0nl WITH NULL('NULL')) FROM 'ansidate.txt'\g
export SEPPARAMDB=testdb export SPARK_MASTER=${SPARK_MASTER:-yarn} export TMP_II_INSTALLATION=${TMP_II_INSTALLATION:-$(iigetenv II_INSTALLATION)} export TMP_HOSTNAME=${TMP_HOSTNAME:-$HOSTNAME} export HDFS_TMP=${HDFS_TMP:-$(iigetenv II_HDFSDATA)} export SPARK_LOADER_JAR=/home/actian/kelch01/Spark/unloader/spark_vector_loader-assembly-0.1.jar
val savepath =sys.env("HDFS_TMP") val installation_ID= sys.env("TMP_II_INSTALLATION") val hostname= sys.env("TMP_HOSTNAME") val databasename= sys.env("SEPPARAMDB")
sqlContext.sql(s"""CREATE TEMPORARY TABLE tab_ANSIDATE USING com.actian.spark_vector.sql.DefaultSource OPTIONS ( host "$hostname", instance "$installation_ID", database "$databasename", table "tab_ANSIDATE", user "actian", password "actian")""")
sqlContext.sql("select * from tab_ANSIDATE").write.format("com.databricks.spark.csv").save(s"$savepath/tab_ANSIDATE.csv")