Closed viirya closed 7 years ago
Merging #253 into master will increase coverage by
0.05%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #253 +/- ##
==========================================
+ Coverage 90.71% 90.77% +0.05%
==========================================
Files 5 5
Lines 334 336 +2
Branches 50 50
==========================================
+ Hits 303 305 +2
Misses 31 31
To support the use case @saniyatech mentioned, timestamp should be straightforward because it is stored as long value. We can add the logical type Timestamp (millisecond precision) to the avro's long schema.
Date maybe a real problem. I don't know why spark-avro stores date as long value of the milliseconds too. Avro's logical type Date is an Avro int. I think the logical type property isn't compatible to an avro's long schema...
To change it from long to int for a date field, it risks backward compatibility.
Btw, seems 1.7.6, the avro version currently used, doesn't support logical types yet. Based on above reason, I think we can only correctly deserialize date/timestamp fields when a Catalyst schema is provided.
Thanks, merge to master.
Thanks @gengliangwang @saniyatech for review.
@gengliangwang @saniyatech any word on when this will get cut to a release? It's nearly a year since this has been fixed but it's still broken in 4.0
@rdefreitas Probably there is no new release, as this repo is migrated into Spark 2.4 as built-in data source module.
@gengliangwang any idea when that release is scheduled or where I can find that?
It will be within this October, I think.
You can try it by building the latest apache spark (master branch or branch-2.4).
Related issue: #229 Related Spark JIRA ticket: https://issues.apache.org/jira/browse/SPARK-22460
Seems it is somehow inconvenient when reading avro files written with DataFrame with timestamp field.
It might be easier to read such data fields if we can explicitly require this data source to interpret a field as timestamp type.
This also add the support of date type together.