apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.65k stars 598 forks source link

Heron Pom files should have proper dependencies listed. #3774

Closed joshfischer1108 closed 2 years ago

joshfischer1108 commented 2 years ago

One of the issues was that we are shading dependencies into the jar. We should instead be providing the jar with only our code, and the pom.xml should include the dependencies.

The scripts to generate the pom.xml files: https://github.com/apache/incubator-heron/tree/master/scripts/ci And the scripts to build the Heron API jar(s): https://github.com/apache/incubator-heron/blob/abb2767e3df3ca6eba009f46efe1f1e83695617a/heron/api/src/java/BUILD#L23-L87

"api-java-low-level": Normal "Topology API" jar "api-java": Functional (i.e. Streamlet jar) "api-java-low-level-functional": Combination of topology and streamlet code "api-unshaded": Java Binary (why a binary???) with both, but kryo neverlink dependency added. I think this might be for Storm compatibility? "api-shaded": Remaps the protobuf classes based on this rule... but the rule doesn't shade any of the other dependencies... If we will continue shading, we should fix this. https://github.com/apache/incubator-heron/blob/master/heron/api/src/java/shade.conf "heron-api": We create a copy of the heron-api.jar for some reason... no idea why this is this way and we don't just rename the previouis build task "classification": No idea what this is... is it part of the Heron API? Why is it not included in the resulting "heron-api" jar?

Questions I have:

  1. Do we want to keep shading dependencies, or should we include the dependencies in the pom.xml?
  2. What is classification jar?
  3. We are publishing a Heron API jar (that final artifact), but our examples are referencing the more immediate streamlet or low-level jars. Should the examples reference that final jar?