apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

Dependency isolation in Integration Tests #8234

Open KKcorps opened 2 years ago

KKcorps commented 2 years ago

Currently, all of the Pinot integration tests are present in a single module pinot-integration-tests. This works fine for most of our modules.

But as we are starting to add more and more integrations such as Pulsar, Kinesis, Pub-Sub etc. the current approach leads to a lot of dependency version conflicts. Most of these conflicts occur in libraries such as netty, servlet-api, guava, jackson etc.

In the individual plugins, these dependencies are generally shaded and hence cause no issues when running in production. Resolving each of these dependencies and finding a correct minor version which is compatible with both existing and new plugin takes some effort.

We may need to come up with a new approach.

Some solutions -

You can see an example of dependency exclusion because of conflicts in the following draft PR - https://github.com/apache/pinot/pull/8235/files#diff-f191832a3b523ff5a1f818baf6a358e449550c34b75171758749e173dc0b37b2

https://github.com/apache/pinot/pull/8235/files#diff-292e0e1b3c16430f3d1763f6f2aaf13cf30832d51eb7f8ab86b92708135f2013

xiangfu0 commented 2 years ago

Thanks @KKcorps for bringing this up.

I also have the same feeling that the integration test module has too many dependency conflicts.

I'm +1 on having each plugin module import pinot-integration-test-base as test dependency and have the integration test within its own module.

KKcorps commented 2 years ago

@mayankshriv mentioned a good point. What if an integration test wants to use dependency from other modules? e.g. KafkaIntegrationTest using AvroDecoder or S3 Storage?

Jackie-Jiang commented 2 years ago

@KKcorps IMO we should make kafka work with avro and S3 because it is a valid setup, but not necessary make kafka, kinesis and pub-sub work with each other as they are serving the same purpose. Essentially we should pick one streaming module, one format module and one deep store module for each integration test module, and solve the dependency conflict between them

KKcorps commented 2 years ago

@Jackie-Jiang Yes solving dependency conflict part among these 3 dependent plugins is fine. But currently, with a common integration-test module, we end up solving conflicts between Kinesis and Kafka as well.

Now, with the approach Xiang mentioned, taking example of your case, we will need to import pinot-avro and pinot-s3 as test dependencies in pinot-kafka module. And solving conficts between these would just ruin the pinot-kafka pom.

Jackie-Jiang commented 2 years ago

Actually in order to make a proper distribution with all these modules supported, we need to solve all the conflicts any way. @xiangfu0 How do we make the distribution currently?

hpvd commented 2 months ago

added this to [parent issue] Apache Pulsar related open topics #13010