deniscoady / flume.ws

Source for Apache Flume which connects to a remote websocket server over plain or secure connections.
MIT License
4 stars 0 forks source link

Add Timestamp of event occurred in jsondata on every message #3

Closed ketangoyani closed 5 years ago

ketangoyani commented 5 years ago

I wanted timestamp of every event occurred in a JSON data that I am receiving. Like I am receiving JSON on every message is like {"type":"l2update","product_id":"BTC-USD","changes":[["sell","3867.79000000","1.61624"]]} so i wanted to add event time in JSON object like {"type":"l2update","product_id":"BTC-USD","eventtime":"2019-02-19T10:42:47.355Z","changes":[["sell","3867.79000000","1.61624"]]}

deniscoady commented 5 years ago

Hi @ketangoyani

Have you taken a look at Flume interceptors? There is one inbox as part of the Flume distribution that does something similar to what you want.

Timestamp Interceptor

This interceptor inserts into the event headers, the time in millis at which it processes the event. This interceptor inserts a header with key timestamp (or as specified by the header property) whose value is the relevant timestamp. This interceptor can preserve an existing timestamp if it is already present in the configuration.

You can use a flume WebSocket source, the timestamp interceptor, and maybe a custom interceptor of your own to transform the data to your linking.

I'm hesitant to include this functionality in this project for three reasons:

  1. Duplicative Effort – This would duplicate existing battle-tested inbox functionality.
  2. Scope Creep – This library is focused on acting as a connector to get data between WebSockets and Flume and is probably not appropriate to introduce ETL capabilities.
  3. Clocks in a distributed system are problematic – Flume can be operated as part of a distributed ingest system. In a distributed architecture the idea of timestamps and when a message is received becomes hazy due to lack of causal consistency and hardware limitations. This is why consensus algorithms like Raft or Paxos are used to have a high confidence (but not guarantee) of the ordering of events.