Closed nelsonong closed 7 years ago
+1
+1
+1 but operators tend to be described in terms of operation being performed against the remote system rather than 'Sink' (or 'Source'). E.g. ElasticsearchIndex
. Depends on exactly the function being performed by the operator.
Also why would the operator specify the names of the input attributes, rather than just the attribute?
Thanks @ddebrunner I will create the repository and then we can work on the design issues that you have brought up in the open before we create a release.
repository created
Introduction
Elasticsearch is a distributed RESTful search engine built for the cloud. Their Github repo can be found here.
Built on top of Lucene, Elasticsearch provides (near) real-time search and provides various sets of APIs (eg. HTTP RESTful API, Native Java API, etc.) for storing and accessing documents.
Graphing tools such as Grafana and Kibana can be used in conjunction with Elasticsearch for analytical and visualization purposes.
Proposal
I would like to propose that a new repository and toolkit be created to enable application developers to store their stream data in a reliable, asynchronous database that can be easily graphed and monitored on popular graphing tools such as Grafana and Kibana.
I propose that the repository be called streamsx.elasticsearch and the toolkit be called com.ibm.streamsx.elasticsearch.
Initial contribution
The toolkit will initially contain an ElasticsearchSink which will ingest incoming tuples and output their attribute names and values (in pairs) to an Elasticsearch database.
The parameters for the ElasticsearchSink operator are as follows:
In addition to these parameters, there are several custom metrics already implemented, including: isConnected, totalFailedRequests, numInserts, reconnectionCount, and (avg|max|min|sum)InsertSizeBytes.