databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

ignore nullable in DeepImageFeaturizer.validateSchema #143

Closed mengxr closed 6 years ago

mengxr commented 6 years ago

In Spark SQL, nullability is a hint used during optimization and codegen to skip nullchecks, but not intended as an enforcement mechanism or as an implication that null values do exist. It might get dropped through the pipeline.

This PR switches to DataType.equalsIgnoreNullability for the check. Without the change, the test would fail with:

[info] - DeepImageFeaturizer accepts nullable *** FAILED ***
[info]   java.lang.ClassCastException: org.apache.spark.sql.types.StructField cannot be cast to org.apache.spark.sql.types.StructType
[info]   at com.databricks.sparkdl.DeepImageFeaturizerSuite$$anonfun$8.apply(DeepImageFeaturizerSuite.scala:134)
[info]   at com.databricks.sparkdl.DeepImageFeaturizerSuite$$anonfun$8.apply(DeepImageFeaturizerSuite.scala:132)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info]   at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
[info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1560)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)

cc: @jkbradley

codecov-io commented 6 years ago

Codecov Report

Merging #143 into master will increase coverage by <.01%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   85.37%   85.37%   +<.01%     
==========================================
  Files          33       34       +1     
  Lines        1921     1922       +1     
  Branches       44       41       -3     
==========================================
+ Hits         1640     1641       +1     
  Misses        281      281
Impacted Files Coverage Δ
...a/com/databricks/sparkdl/DeepImageFeaturizer.scala 93.65% <100%> (ø) :arrow_up:
...cala/org/apache/spark/sql/types/DataTypeShim.scala 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a44fcbb...8a832f0. Read the comment docs.