GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 323 forks source link

Dataflow SDK and Google Cloud Java SDK conflict #536

Open vaibhavsw opened 7 years ago

vaibhavsw commented 7 years ago

Google Cloud SDK which includes all the libraries from Cloud Storage to Cloud PubSub have conflict showing

Exception in thread "main" java.lang.NoClassDefFoundError: com/google/protobuf/GeneratedMessageV3 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at com.google.pubsub.v1.PublisherGrpc.<clinit>(PublisherGrpc.java:41) at com.google.cloud.pubsub.spi.v1.PublisherSettings$Builder.<init>(PublisherSettings.java:480) at com.google.cloud.pubsub.spi.v1.PublisherSettings$Builder.createDefault(PublisherSettings.java:519) at com.google.cloud.pubsub.spi.v1.PublisherSettings$Builder.access$000(PublisherSettings.java:412) at com.google.cloud.pubsub.spi.v1.PublisherSettings.defaultBuilder(PublisherSettings.java:213) at com.google.cloud.pubsub.spi.DefaultPubSubRpc.<init>(DefaultPubSubRpc.java:160) at com.google.cloud.pubsub.PubSubOptions$DefaultPubSubRpcFactory.create(PubSubOptions.java:69) at com.google.cloud.pubsub.PubSubOptions$DefaultPubSubRpcFactory.create(PubSubOptions.java:63) at com.google.cloud.ServiceOptions.getRpc(ServiceOptions.java:482) at com.google.cloud.pubsub.PubSubImpl.<init>(PubSubImpl.java:115) at com.google.cloud.pubsub.PubSubOptions$DefaultPubSubFactory.create(PubSubOptions.java:44) at com.google.cloud.pubsub.PubSubOptions$DefaultPubSubFactory.create(PubSubOptions.java:39) at com.google.cloud.ServiceOptions.getService(ServiceOptions.java:469) at dataflow.demo.Main.getMessages(Main.java:14) at dataflow.demo.Main.main(Main.java:10) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.GeneratedMessageV3 at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 32

The error goes away if I remove dataflow SDK. I need to use Dataflow & google cloud with PubSub SDKs in the same project.

dhalperi commented 7 years ago

Hi @vaibhavswarnkar ,

It sounds like you are using dependencies that are not compatible with the versions that Google Cloud Dataflow depends on. Please see https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline

Specifically, you may only use libraries that depend on com.google.protobuf:protobuf-java:3.0.0-beta-1.. To use another library, you can use Maven shading to repackage the conflicting library and all its transitive dependencies.

Dan

metabrain commented 7 years ago

Hi,

I am having the same issue and have narrowed it down to version incompatability between

            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud</artifactId>
            <version>0.8.0</version>

and

            <groupId>com.google.cloud.dataflow</groupId>
            <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
            <version>1.9.0</version>

I can never get both to work in the same project at the same time, even while trying several combinations of older versions for each artifact. Either one fails (Dataflow) or the other (I'm only using pubsub from Cloud) at runtime.

Also, is there any reason why Cloud is using com.google.protobuf:protobuf-java:3.0.0-beta-1 ..? Published on Maven Central there's several (later) betas of 3.0.0, plus it's proper release 3.0.0. In the meantime, both 3.0.2 and 3.1.0 were already released. https://mvnrepository.com/artifact/com.google.protobuf/protobuf-java for reference

Cheers, Dan

dhalperi commented 7 years ago

Hi Dan,

To use google-cloud with Google Cloud Dataflow SDK for Java 1.9.0 or earlier, you'll have to use something like maven-shade-plugin to rebundle and repackage your use of google-cloud artifact and its transitive dependencies on protobuf, etc.

The reason Google Cloud Dataflow cannot upgrade to a newer version of com.google.protobuf:protobuf-java than 3.0.0-beta1 is that this is a breaking change for our users. We can only make such upgrades at major versions.

In Dataflow 2.0.0-beta1, we have moved to version 3.0.0 of protobuf. Additionally, protobuf-java starting at 3.0.0 is supposed to have better backwards compatibility than 3.0.0-beta1, so future upgrades may be possible if they are not breaking changes.

Dan

metabrain commented 7 years ago

Thanks Dan,

Yeah, had some issue running the beta version in my machine as well, so in the end I replaced the google-cloud artifact with the Spotify implementation of GCE PubSub by REST, which meant I could run dataflow 1.9.0. All works fine now. I will revert back to using google-cloud artifact when version dataflow 2.0.0 leaves beta and there is no issues between it and google-cloud regarding protobuf versions.

Thanks once again for your help!

Cheers, Dan