apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.93k stars 980 forks source link

DRILL-8417: Allow Excel Reader to Ignore Formula Errors #2783

Closed cgivre closed 1 year ago

cgivre commented 1 year ago

DRILL-8417: Allow Excel Reader to Ignore Formula Errors

Description

If Drill encounters an Excel formula which is invalid somehow, such as a DIV/0, Drill is unable to proceed and throws a number format exception. This PR adds a config parameter called ignoreErrors which allows Drill to skip such records and returns null for that cell. Drill will also output a log warning. When set to false, original behavior is retained.

Documentation

Updated README

Testing

Added two unit tests.

cgivre commented 1 year ago

@jnturton I updated the PR to default to false and updated the README as well.

jnturton commented 1 year ago

Reviewer's note: all format-excel tests do pass, the CI test failures here are a result of as yet unfixed breakage brought in by Calcite 1.35-SNAPSHOT.

cgivre commented 1 year ago

Once https://github.com/apache/drill/pull/2794 is merged, I'll rebase and merge this, pending @jnturton's approval.