databricks / spark-avro

Avro Data Source for Apache Spark
http://databricks.com/
Apache License 2.0
539 stars 310 forks source link

Fix bug on generation of avro files with nested records with the same name #249

Closed gsolasab closed 7 years ago

gsolasab commented 7 years ago

spark-avro was failing on the generation of avro files when nested records had the same name, as can be seen on the following issue #54. For solving that problem has been added a different namespace on each record. This namespace is created with the concatenation of all the parents of each record separated by dot. Added also a new test in AvroSuite for checking the new code. This is related with pull request #73 but have been added a modification on AvroOutputWriter to solve correctly the issue.

codecov-io commented 7 years ago

Codecov Report

Merging #249 into master will increase coverage by 0.25%. The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
+ Coverage   90.46%   90.71%   +0.25%     
==========================================
  Files           5        5              
  Lines         325      334       +9     
  Branches       51       50       -1     
==========================================
+ Hits          294      303       +9     
  Misses         31       31
lyubinetz commented 7 years ago

Ping @JoshRosen - this seems like it actually does what https://github.com/databricks/spark-avro/pull/73 had to accomplish. A couple of other people asked for a solution to this, so since this is a relatively common issue (and kind of a bug), this should probably be merged.

P.S. This is @vlyubin from different account :)

gengliangwang commented 7 years ago

@gsolasab Thanks! @JoshRosen is currently on vacation. Could you also add test cases for Array/Map type?

bloomonkey commented 7 years ago

We're getting bitten by this too. If we (@autotraderuk) were able to contribute the missing tests, how soon could the fix be released?

gsolasab commented 7 years ago

Tomorrow I will do the tests.

gengliangwang commented 7 years ago

@gsolasab Cool, please add test cases on file reading as well. We plan to release recently, and I think we should merge this PR.

gengliangwang commented 7 years ago

Overall LGTM except one comment.

gengliangwang commented 7 years ago

Thanks, merge to master.