apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.17k stars 1.21k forks source link

[spark-connector] Add option to fail read when there are invalid segments #13080

Open cbalci opened 2 weeks ago

cbalci commented 2 weeks ago

Adding a spark reader option (failOnInvalidSegments) which can be used to fail the read operation, if server response indicated some segments were pruned as invalid. Also renamed TypeConverter to DataExtractor to better capture what it does.

Testing Included basic unit tests. Also manually used the included integration test runner to validate the behavior with some invalid segments.

feature

codecov-commenter commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 28.57143% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 62.17%. Comparing base (59551e4) to head (f9451d2). Report is 408 commits behind head on master.

Files Patch % Lines
...not/connector/spark/datasource/DataExtractor.scala 0.00% 7 Missing :warning:
.../connector/spark/v3/datasource/DataExtractor.scala 0.00% 7 Missing :warning:
...k/common/reader/PinotAbstractPartitionReader.scala 0.00% 2 Missing :warning:
...ector/spark/datasource/PinotDataSourceReader.scala 0.00% 1 Missing :warning:
...nnector/spark/datasource/PinotInputPartition.scala 0.00% 1 Missing :warning:
...onnector/spark/v3/datasource/PinotDataSource.scala 0.00% 1 Missing :warning:
...inot/connector/spark/v3/datasource/PinotScan.scala 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #13080 +/- ## ============================================ + Coverage 61.75% 62.17% +0.42% + Complexity 207 198 -9 ============================================ Files 2436 2507 +71 Lines 133233 137800 +4567 Branches 20636 21332 +696 ============================================ + Hits 82274 85679 +3405 - Misses 44911 45731 +820 - Partials 6048 6390 +342 ``` | [Flag](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <ø> (-0.01%)` | :arrow_down: | | [integration](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <ø> (-0.01%)` | :arrow_down: | | [integration1](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <ø> (-0.01%)` | :arrow_down: | | [integration2](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <ø> (ø)` | | | [java-11](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.12% <28.57%> (+0.41%)` | :arrow_up: | | [java-21](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.06% <28.57%> (+0.43%)` | :arrow_up: | | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.14% <28.57%> (+0.40%)` | :arrow_up: | | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.04% <28.57%> (+34.31%)` | :arrow_up: | | [temurin](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.17% <28.57%> (+0.42%)` | :arrow_up: | | [unittests](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.17% <28.57%> (+0.42%)` | :arrow_up: | | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.85% <ø> (-0.04%)` | :arrow_down: | | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/13080/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.75% <28.57%> (+0.02%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jackie-Jiang commented 1 week ago

cc @snleee @swaminathanmanish @KKcorps