GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

Eclipse dataflow plugin errors when running the WordCount example pipeline #561

Closed rafaelsf80 closed 7 years ago

rafaelsf80 commented 7 years ago

Hi, I'm trying to run the wordcount example pipeline with the dataflow plugin (1.2.0) for eclipse Neon.2 (4.6.2), and got this error when trying to Run a new configuration:

An internal error occurred during: "Update Hierarchy". Tried to create a TypeHierarchyPipelineOptionsHierarchy for a Java Project my-artifact-rafa where no PipelineOptions type exists.

Updating the artifact google-cloud-dataflow-java-sdk-all from 1.9.0 to 2.0.0-beta1 or 2.0.0-beta2 on pom.xml shows this error instead when running the pipeline:

Exception in thread "main" java.lang.Error: Unresolved compilation problems: 
    PipelineOptionsFactory cannot be resolved
    Pipeline cannot be resolved to a type
    Pipeline cannot be resolved
    TextIO cannot be resolved
    ParDo cannot be resolved
    TextIO cannot be resolved

    at com.google.cloud.dataflow.examples.WordCount.main(WordCount.java:191)

Any guidance would be appreciated.

Thanks, Rafa

davorbonaci commented 7 years ago

@tgroh, can you please take a look?

tgroh commented 7 years ago

Usually, this occurs due to the maven view of the dependencies being out of sync with the declared dependencies of the project. This means the PipelineOptions type is not present within the classpath of the Eclipse Project. The maven dependencies are likely to not have been properly resolved.

From the project context menu, select the Maven -> Update Project item, check the "Force Update of Snapshots/Releases" checkbox, and run the update. This should ensure the Dataflow Jar is on the classpath, which will enable the Dataflow Eclipse Plugin to resolve the base PipelineOptions class, and the remainder of the PipelineOptions class hierarchy.

Updating to a 2.0.0-beta-x version will not work, as the namespaces for all of the underlying PTransforms and Pipeline methods has changed from com.google.cloud.dataflow to org.apache.beam

rafaelsf80 commented 7 years ago

Hi, Thanks for the answer. The issue was the one you described: the maven view was out of sync. However the Force Update did not solve the issue itself. Since I only use maven for this particular project (for nothing else), I removed the $HOME/.m2 content and then after a mvn clean install, everything works properly now. Thanks for the suggestion, Rafa

davorbonaci commented 7 years ago

Thanks for confirming, @rafaelsf80!