apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.65k stars 597 forks source link

Need supports for external dependencies #3274

Open sautran opened 5 years ago

sautran commented 5 years ago

Right now in Heron, we bundle all dependencies in a huge uber jar. Can heron support similar feature as Storm https://storm.apache.org/releases/2.0.0-SNAPSHOT/Classpath-handling.html to offload the 3rd party libraries to a known location so not need to take a long time to deploy a topology

nwangtw commented 5 years ago

Interesting. I suppose it is possible but I am not a Java expert. @skanjila @jerrypeng @nlu90 @huijunw you guys have any thoughts?

huijunwu commented 4 years ago

heron submit ... --config-property heron.classpath.instance=<path-to-class> .. This option is passed to heron-executor which launches heron-instance with the specified class path.

For a general container, it needs two tar.gz: heron-core.tar.gz and topology.tar.gz. heron-instance is only a small part of heron-core.tar.gz, which I do not think pre-installed heron-instance class path can save much time. For topology.tar.gz, pre-installed class path may save lots of time depending on your job lib size.

At present there is only one classpath for heron-instance, in other words, the heron-instance-itself class-path and user-job-class-path are the same classpath.

1735 two class paths, is a candidate solution.

nicknezis commented 4 years ago

This is interesting. I was thinking of putting extra jars into a Docker layer that extends the default Heron provided images. But providing better support for this in Heron would be nice.