csantanapr commented 8 years ago

cc @sjfink @rabbah

Outline what are the steps to create a feed and then from the feed create triggers

The feed will be a running service that the user will be responsible, we will use a nodejs server app as example on how to achieve this, but any programing language can be use.

sjfink commented 8 years ago

some initial notes coming -- very rough descriptions. I will break this over several comments

1. Feed Architecture Choices

There are at least 3 strategies for creating a feed: "Hooks", "Polling" and "Connections".

Hooks

"Hooks" means we set up a feed using a webhook facility from another service. In this strategy, we configure a webhook on a service to POST directly to a whisk URL to fire a trigger. This is by far the easiest and most attractive option for implementing low-frequency feeds.

The github feed is implemented using webhooks.

Polling

"Polling" means that we arrange a whisk action to poll an endpoint periodically to fetch new data. This is easy to build, but is low performance, and is limited by the polling interval.

I have a prototype of a MessageHub feed using polling.

Connections

"Connections" means that we stand up a separate service somewhere that maintains a persistent connection to a feed source. The connection based implementation might interact with a service endpoint via long polling, or to set up a push notification.

Our cloudant changes feed is connection based. We are working on a high-performance MessageHub connection-based feed using Kafka consumers.

sjfink commented 8 years ago

2. Difference between Feed and Trigger

Some definitions:

Whisk processes events which flow into the system.
A trigger is simply a name for a class of events. Each event belongs to exactly one trigger. A good analogy is "topic" in pub-sub world. A Rule T -> A means "whenever an event from trigger T arrives, invoke action A with the trigger payload.
A feed is a stream of events which all belong to some trigger T. A feed is controlled by a feed action which handles creating, deleting, pausing, and resuming the stream of events which comprise a feed. The feed action typically interacts with external services which produce the events, via a REST API that manages notifications.

sjfink commented 8 years ago

3. Implementing Feed Actions

The feed action is a normal OpenWhisk action, but it should accept the following parameters:

lifecycleEvent: one of 'CREATE', 'DELETE', 'PAUSE', or 'UNPAUSE'
triggerName: the fully-qualified name of the trigger which contains events produced from this feed.
authKey: the Basic auth credentials of the OpenWhisk user who owns the trigger just mentioned

The feed action can also accept any other parameters it needs to manage the feed. For example the cloudant changes feed action expects to receive parameters including 'dbname', 'username', etc.

When the user creates a trigger with the --feed parameter, the system automatically invokes the feed action with the appropriate parameters.

For example,assume the user has created a mycloudant binding for cloudant with their username and password as bound parameters. Then when the user issues:

wsk trigger create T --feed mycloudant/changes -p dbName myTable,

then under the covers the system will do something equivalent to: wsk action invoke mycloudant/changes -p lifecycleEvent CREATE -p triggerName T -p authKey <userAuthKey> -p password <password value from mycloudant binding> -p username <username value from mycloudant binding> -p dbName mytype

The feed action takes these parameters, and is expected to take whatever action is necesssary to set up a stream of events from cloudant, with the appropriate configuration, directed to the trigger T. For cloudant, the action happens to talk directly to a cloudanttrigger service we've implemented with a connection-based architecture. We'll discuss the other architectures below.

A similar feed action protocol occurs for wsk trigger delete. We have not yet implemented pause and unpause, but they will be similar.

mbehrendt commented 8 years ago

i think it'll be very important to include a description of how event data is going to flow into whisk via triggers. as far as i see, we often need a long-running process between whisk and the event source, whereas this lrp receives the event data (as a result of the feed-based trigger creation), and then translates the data into wsk trigger fire calls.

sjfink commented 8 years ago

(Moving to briefer text now to get the initial outline in before discussion starts)

4. Implementing Feeds with hooks.

Setting up a feed via a hook is by far the easiest way to start, and should be the recommended way to encourage users to create feeds (according to SJF. System MessageHub devotees often argue differently --but SJF does not believe Kafka consumers can ever be as easy as webhooks, and SJF believes we should stress simple UX above all else).

With this method there is no need to stand up any persistent service outside of whisk. All feed management happens naturally though stateless whisk actions.

When invoked with CREATE the feed action simply installs a webhook for some other service, asking the remote service to POST notifications to the appropriate fireTrigger URL in whisk.

The webhook should be directed to send notifications to a URL such as: POST /namespaces/{namespace}/triggers/{triggerName}

sjfink commented 8 years ago

5. Implementing Feeds with polling.

It is possible to set up an action to poll a feed source entirely within whisk, without the need to stand up any persistent connections or external service.

For feeds where a webhook is not available, but do not need high-volume or really quick response times, polling is an attractive option and should be recommended.

To set up a polling-based feed, the feed action takes the following steps when called for CREATE:

The feed action sets up a periodic trigger (T) with the desired frequency, using the whisk.system/alarms feed. (meta!)
The feed developer creates a `pollMyService' action which simply polls the remote service and returns any new events.
The feed action sets up a rule T -> pollMyService.

That's it. We've implemented a polling-based trigger entirely using whisk actions, without any need for a separate service.

mbehrendt commented 8 years ago

With this method there is no need to stand up any persistent service outside of whisk.

i agree we should offer to have the ability to integrate as easy as possible with webhooks.

however, today that requires webhooks send their payload with an application/json msg type.

several webhooks also use other msg types, so need to document that constrain and open an issue to allow more msg types. also, we need to document the webhook based approach, as you did above (need to add auth information)

mbehrendt commented 8 years ago

re the polling approach -- i think we'll have to work through in which context the action doing the polling is being executed. if it's done in the context of the user, he'll get charged for the resource consumption, which will come across as being weird, since the user shouldn't have to know that the impl of a feed is using actions and shouldn't have to pay for it.

sjfink commented 8 years ago

6. Implementing Feeds via Connections

The previous 2 methods are easy .. but if you want a high-performance feed, there is no substitute for persistent connections and long-polling or similar techniques.

Since OpenWhisk actions are stateless, right now there is no way keep a persistent connection open to a third party. So instead, we are forced to stand up a separate service (outside of whisk) that runs all the time. We call these provider services. A provider service can maintain connections to third party event sources that support long polling or other connection-based notifications.

The provider service should provide a REST API that allows the whisk feed action to control the feed. The provider service acts as a proxy between the event provider and whisk -- when it receives events from the third party, it sends them on to whisk by firing a trigger.

The cloudant built-in feed is the canonical example -- it stands up a cloudanttrigger service which mediates between cloudant notifications over a persistent connection, and whisk triggers.

The alarm feed is implemented with a similar pattern.

We will soon provide a MessageHub feed with a similar pattern.

For the moment, these provider services must POST events to whisk in order to fire triggers. Eventually, we will hook up MessageHub events directly into whisk to avoid the POST overhead.

The connection-based architecture is the highest performance option -- but it's far more difficult to operate and maintain than the polling and hook architectures. The provider service must be production-quality: it must be highly-available and fault-tolerant. Our current providers do not meet this requirement, and will need work to reach production quality standards.

sjfink commented 8 years ago

7. Finished

Ok I'm done with the initial draft. @rabbah feel free to edit my comments directly if you like to clarify or improve.

@mbehrendt is correct that we will need to support other payload formats (not just Application/JSON) to expand the set of webhooks we can play with.

mbehrendt commented 8 years ago

thanks @sjfink -- excellent summary of all key information, thanks for putting this together so quickly.

csantanapr commented 8 years ago

Thanks @sjfink you are a fast typer 😄 @mbehrendt @sjfink I was not sure if there are new issues to be created from the above comments about payload formats? Should I create a new issue, and you guys can expand on what's the work need it?

mbehrendt commented 8 years ago

@csantanapr i thin kit would be good to have a thread about payload format / scheme, would be great if you could open that up.

csantanapr commented 8 years ago

@mbehrendt @sjfink issue #567 created for payload formats

jthomas commented 8 years ago

Since I've just been through the process of creating my own feed provider, here's my feedback...

Good section about the architectures, I hadn't thought about using the alarm package to handle polling. I spent most of my time digging through the existing catalogue feeds to understand how they work. Would be useful to include code samples with each of the architecture types to help users see how this works in reality. Either that or link directly to one of the catalogue samples that implements that pattern.
This section should include the commands needed to registered a feed, I had to find this from the setup scripts.

wsk action create -a feed true feed_name feed_action.js

csantanapr commented 8 years ago

@jthomas good suggestion

Would be useful to include code samples with each of the architecture types to help users see how this works in reality. Either that or link directly to one of the catalogue samples that implements that pattern.

We have issues open to add to the catalog the cloudant/couchdb and alarm packages and we could use those as examples in the docs.

skaegi commented 7 years ago

That last bit from @jthomas about annotating an action really needs to be better documented and perhaps given first class syntax.

-a feed true -- adds a feed annotation to the action and is required to register it as a feed provider

aarora91 commented 7 years ago

I want to explore the third approach for implementing feeds-via connections. I have a Slack outgoing webhook which should hit an OW action's REST endpoint. Slack doesn't allow me to customize it's request header so I cannot pass in my OpenWhisk creds via Basic Auth so requests to OW get rejected. I might be misunderstanding this but I think a "connection" intermediate will help.

Your explanation is nice but would be great if you can provide some code samples or snippets at the very least as I don't know where to start.

Thanks in advance!

jthomas commented 7 years ago

@aarora91 In theory you could use the API Gateway feature in OpenWhisk to configure a public endpoint which resolves this issue. Unfortunately at the moment the content format isn't supported by this service, see https://github.com/openwhisk/openwhisk/issues/1655

aarora91 commented 7 years ago

Thanks @jthomas for the quick response. I see you too ran into the same problem when using Slack's outgoing webhooks. Hope we get this feature in Openwhisk. Meanwhile, I did some digging and looks like API connect can be used for this-> https://www.youtube.com/watch?v=WP6D47KxSrs The Bluemix GUI has changed since the video was made so I had some trouble navigating around the page.

markusthoemmes commented 7 years ago

I think this is done, looking at: https://github.com/openwhisk/openwhisk/blob/master/docs/feeds.md

apache / openwhisk

Document how an user can create their own feed to create triggers from external service. #559

1. Feed Architecture Choices

Hooks

Polling

Connections

2. Difference between Feed and Trigger

3. Implementing Feed Actions

4. Implementing Feeds with hooks.

5. Implementing Feeds with polling.

6. Implementing Feeds via Connections

7. Finished