Use kafka topics instead of postgres for storing events.

khalilgharbaoui commented 3 years ago

Hi @frathon,

I wanted to suggest Kafka because it's made for these types of events we have to deal with in trading etc... I thought it might be a better suggestion than Postgres maybe?

You probably already know about it of course but for whoever reads this and does not know this is what Kafka is:

https://www.youtube.com/watch?v=FKgi3n-FyNU

PS: This docker-compose below worked out great for me: https://github.com/wurstmeister/kafka-docker/blob/master/docker-compose-single-broker.yml

Cinderella-Man commented 3 years ago

Hi @khalilgharbaoui,

Sorry for the late response, I need to sleep on this one for a bit :wink:

Thank you for opening an issue, but as far as I understand Kafka would not be a great fit for this project(I could be wrong :man_shrugging:).

My argumentation would be that Kafka would make the whole application more complex(without substantial advantages) than with simple PubSub.

For example:

the main advantage of using Kafka is that you can rerun topics(events), which sounds like it could be potentially useful in our case for backtesting but on the other hand in normal trading circumstances if we will miss some events it's not a big deal - the last price is the only one that matters.
whole system would become eventually consistent making it difficult for people to understand why data is inconsistent (as different DB representation of data can take longer to update)
you need to watch out how you will design that sort of system as you can't just treat Kafka as DB: https://fivetran.com/blog/kafka-is-not-a-database
Kafka would steer the whole course away from Elixir as the whole system needs to be designed differently

I would be more than happy to hear counterarguments :+1: in the end, I'm not extremely knowledgeable about Kafka.

khalilgharbaoui commented 3 years ago

@Cinderella-Man Well I'm glad you want to hear some counterarguments at least 😁 and I must argue that it's not only the last price that matters...

For example if you want to have a good RSI indicator value you need to at least have:

the last 14 prices
run calculation on those to get a solid indication
with which you can decide whether you are in buy or sell zone.

Also, main trend direction calculation or macd or moving average would need a bit more data than just the last price as well....

And that's where Kafka topics comes in to serve as a short buffer that outputs these precalculated values continuously in real-time.

One could play back with a specific offset (last 14) run calculations on those to get an indicator value push that over PubSub. So it won't receive just the price... it will receive an indicator value and a bid/ask price. This way all the calculations can happen somewhere else and the naive trader can stay naive.

See it like this.... for example:

The current price or current bid and ask prices are the ability to fight a good kung fu fight on the deck of a ship full of karate fighters.

The values buffered through a Kafka topic will allow the ship to maneuver itself away from the rocks and through rough waters while the fighter has his hands full.

Of course some values do need a real database... and I also do think there might be another way to do this without Kafka. Maybe a kind of cached or buffered PubSub... of sorts... But you would need 2 things in parallel a precalculated indicator values and just a price.

Hmm, 🤔 or even better only send the price to naive trader if the pre-calculated indicator says it's ok, so it won't act unless it needs to.

Then it's truly the art of wining without fighting... and sailing without steering. (may have answered my own question here 😅)

But still some kind of buffering needs to happen one way or the other... if a precalculated indicator comes into play right? At least that is how I build it currently in my system. And its maneuvering itself safely so far.

Let me know if you think of another kind of solution for this in Elixir or in something else I'm curious to hear it.

Cheers 😁

drobban commented 3 years ago

Perhaps just me being conservative, but using Kafka in this use case of calculating indicators sounds to me to be as overkill as using a sledgehammer to drive in a screw - a simple screwdriver would have solved it as well, cheaper, cleaner - maybe a bit slower.

khalilgharbaoui commented 3 years ago

Perhaps just me being conservative, but using Kafka in this use case of calculating indicators sounds to me to be as overkill as using a sledgehammer to drive in a screw - a simple screwdriver would have solved it as well, cheaper, cleaner - maybe a bit slower.

@drobban I agree on the cheaper and cleaner part, and you might be right about the screwdriver, but in this case the "maybe a bit slower" part of this really matters because if you are doing real-time trades for micro profits by the time you send out the limit buy/sell order with a certain price the situation may have already changed on the market, there is a high level of volatility I've seen this in practice.

The calculations are only useful at that moment. You kind of need them to be live.

Let me say it another way... if you are playing a multiplayer whack-a-mole game with a bunch of other bots as competition you'll be more effective with a hammer... 😅

Even though I do strive for a screwdriver solution with a simple buffer of sorts maybe?

Cinderella-Man commented 3 years ago

Hi guys :wave:

@khalilgharbaoui thank you for your arguments and I again have counterarguments that are more Elixir focused

@drobban Thanks for joining in, in general, I agree although @khalilgharbaoui deserves some feedback as it looks like he put a fair amount of effort into arguments for his Kafka case.

My counterarguments to @khalilgharbaoui message: In general, you are right about the fact that not only the last message matters if we would venture into more complex indicators, etc.

But...

Let me give you an example and hopefully, everything would become clearer. Let's say that we need to calculate something simple like OHLC(open-high-low-close price levels in last x minutes - note for people not into trading) and we have only access to trade events (current/last price) data.

In Elixir world you could start a process that would just listen to the "trade events" stream and keep on adding prices(just a number) to its state in memory. At the moment when trade event with timestamp outside of current time interval would be received by this "ohlc aggregator" process, it would simply produce an OHLC data(grab the first price[open], find highest, find lowest and grab the last one[close price]) and push that to another stream.

A concrete example:

you are starting 1 minute[that's the "time interval" that I referred to earlier] "ohlc aggregator" at 14:00:00, it listens to stream and collects all the prices up to the first event at 14:01:xy - let's say that there's 200 of them inside that minute - that's 200 numbers put into memory instead of using Kafka and rerunning last minute's events.

But now you could wonder what about bigger intervals - 2 minutes ohlc aggregator will have 400 numbers, 5 minutes will have 1000 - and it goes up to 24h ohlc aggregator (it's 14.4k number in memory and some people want bigger OHLCs like weekly or even monthly ones) - this clearly looks like it doesn't scale well.

It wouldn't but the magic of elixir is that you don't need to deal with the same stream all the time - you can ad-hock produce data like 1 minute aggregated OHLC data and pass it to "2 minutes ohlc aggregator" - so max it would have is 8 numbers in memory.

Getting back to a concrete example: Starting 1 minute OHLC aggregator and 2 minutes ohlc aggregator at 14:00:00 1 minute OHLC aggregator subscribes to trade events (as explained above) and at 14:01.xy it produces an OHLC data into some "ohlc-1-minute" topic in PubSub 2 minute OHLC aggregator in the same time subscribes to "ohlc-1-minute" topic and gets 1 ohlc struct at 14:01 and the second one at 14:02 - combines both and pushes to "ohlc-2-minute" topic in PubSub Note: both are removing all the data from memory after the end of the interval

How all of this works with strategies (aka the big picture): Binance.Streamer broadcasts the message to trade_events:x pubsub topic -> 1 minute ohlc aggregator grabs those and produces ohlc data into ohlc-1-minute topic -> naive traders are listening to the ohlc-1-minute topic

If you want your strategy to get "all the ohlcs" you need just a new process that listens to all-time ohlc topics or you can even publish all of them to the same topic just make sure that it's easy to figure out what time interval it's that ohlc data for. Either way that process with have a list of all the lastest ohlcs - 1 ohlc data per interval and it will broadcast that aggregation to some other topic(which the Naive.Trader can subscribe to).

I'm just trying to show you that possibilities here with simple topics are limitless because Elixir allows you to create extremely lightweight and simple processes and they can aggregate stuff "real-time" for you instead of trying to reply from Kafka(I assume because of non-memory based topics in Kafka, Kafka would be a slower approach as well).

I hope that helps :wink:

khalilgharbaoui commented 3 years ago

@Cinderella-Man Thank you so much for your last reply I will read up on it after work really appreciate it 🙏

Cinderella-Man commented 3 years ago

Closing this one - I think Kafka won't be a good fit because of the Elixir-focused nature of the course. Thank you for opening the issue and the conversation :+1:

Cinderella-Man / hedgehog

Use kafka topics instead of postgres for storing events. #3