GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 323 forks source link

Unable to write custom attributes with PubsubIO #643

Closed darshanmehta10 closed 6 years ago

darshanmehta10 commented 6 years ago

I am using the below code to publish messages to Pubsub from dataflow:

@Override
public PDone expand(PCollection<String> collection) {
    log.info("Will write to Pub/Sub topic: {}", topic);
    return PubsubIO.writeStrings()
            .to(topic)
            .expand(collection);

I want to write custom attributes as Key Value pairs with messages, such as "type" : "some_type" However, the library only allows to set Id and Timestamp attribute names but not the custom ones.

Is there any way to set these attributes/values?

lukecwik commented 6 years ago

You have to first use a ParDo which converts your strings into PubsubMessages and then apply PubsubIO.writeMessages(). For example:

return collection.apply(ParDo.of(myDoFnThatConvertsStringsToPubsubMessagesWithAttributes))
  .apply(PubsubIO.writeMessages().to(topic));

As a side note, please never use expand as it is the wrong API to apply transforms to PCollections. You should always rely on PCollection.apply(PTransform).