causify-ai / kaizenflow

KaizenFlow is a framework for Bayesian reasoning and AI/ML stream computing
GNU General Public License v3.0
112 stars 77 forks source link

Two-sided arbitrage data frame #33

Open gpsaggese opened 1 year ago

gpsaggese commented 1 year ago

Create a single dataframe with all the data from multiple exchanges

timestamp UTC, asset, quoted_price, quoted_currency, volume, type, abs_value, exchange, country

where

We need to do the conversion and need 1 minute exchange rates to get the official price The idea is to transform the data from "relative" (e.g., BTC / ETH or BTC / EUR into absolute, i.e., price of BTC in USD) Let's start with only two exchanges (e.g., Binance, OkX) and a few coins (BTC, ETH) Look for situations where in the same minute the difference in absolute value between abs assets is more than X% (e.g., 2%)

Chandramani05 commented 1 year ago

Ok I will start working on it...

Chandramani05 commented 1 year ago

I was able to create df with following columns timestamp, quoted_price, base_assest and quoted_assest (In data its either USDT or BUSD). Please look at the image. (I split the currencypair by ).

I don't know how to calculate quoted_reference and abs_value. Do I need additional source of data ?Can you help me with the formula ?

Screenshot 2023-02-13 at 11 27 03 PM
gpsaggese commented 1 year ago

Very good.

1) The idea is that quoted_reference is the value of the base asset quoted in exactly the same reference currency (e.g., USD, the dollar) So we need the value of the BUSD with respect to USDT in your example (which is almost 1 but not exactly since it's a stable coin) and multiple / divide the quoted price by the value of the quoted asset. E.g., If something is quoted in EUR we need to convert EURO in USD. So we might need to #32.

2) I would start from collecting all the quote assets and see what they are.

3) Also @jsmerix is adding more exchanges and also futures vs spot.

4) We want to have a monster spreadsheet with all the quoted coins with respect to the exact same asset (i.e., USD). Then we can look for arbitrage opportunities.

5) Then we will generalize to seconds quotes

6) Then we can generalize looking for triangular or any "cyclic" arbitrage opportunities.

Makes sense as high-level plan?

Also you can work on this with @thejameszhang since spot-futures arbitrage is a special case of this work. You guys can organize as you want in the team.

jsmerix commented 1 year ago

Also @jsmerix is adding more exchanges and also futures vs spot.

Updated the database of datasets, refer to the following gdrive folder to access them.

Chandramani05 commented 1 year ago

Progress Report : 1) I have analyzed the new update data. We have three exchange IDs now : Binance (Future and Spot), Binanceus and OKX. I have created a separate csv (locally only for now). So there are 4 csv files for each exhange_ids. 2) I try to concate all the data frame into one but due to su much volume, the jupyter kernal crashes every time. 3) I started characteristics of the exchanges such as average volume. Here are some plots and data frames for OKX. I have created these for all 4.

Screenshot 2023-02-19 at 1 02 13 AM

In this I have grouped the data by and calculated the average of closing and volume. I have also created lag_df which shifts the close price of exchange as Close(t-2)', 'Close(t-1)', 'Close(t)', 'Close(t+1)'. I though it might be useful if we want to predict for future using past.

4) Some of the visualization plots of OKX and Binacus are : OKX_rolling output_binacus output

Issues : 1) I am figuring out what might be our next step ? Should be analyse the data more 2) I still can't figure out how to calculate the 'quote_refernce'.

Let me know if how I can improve more.

Thanks!!

gpsaggese commented 1 year ago

Progress Report :

Very good job @Chandramani05. You are doing well.

  1. I have analyzed the new update data. We have three exchange IDs now : Binance (Future and Spot), Binanceus and OKX. I have created a separate csv (locally only for now). So there are 4 csv files for each exhange_ids.

Best approach is to save the data in the project dir https://drive.google.com/drive/u/0/folders/1eKj6u_cbQM0ZLZ4wRJ6xPM1oqIKwusUo and do PRs for your notebook so I can give you feedback based on the code / data too.

Also create a gdoc in your dir and document your work as you as you would explain to someone. The issue is a good place to summarize the results, but you need a document with higher resolution notes.

  1. I try to concate all the data frame into one but due to su much volume, the jupyter kernal crashes every time.

No surprise: big data problems. You can either save the data in a better format (e.g., Parquet) or year-by-year (or month-by-month) depending on what you can keep in memory.

So you can have a notebook to preprocess the data and then save it to disk. Then you have another notebook to load the pre-processed data and compute the model, saving some results. Then another stage and so on.

  1. I started characteristics of the exchanges such as average volume. Here are some plots and data frames for OKX. I have created these for all 4.

In this I have grouped the data by and calculated the average of closing and volume. I have also created lag_df which shifts the close price of exchange as Close(t-2)', 'Close(t-1)', 'Close(t)', 'Close(t+1)'. I though it might be useful if we want to predict for future using past.

  1. Some of the visualization plots of OKX and Binacus are :

Let's start collecting the info in the gdoc.

Issues :

  1. I am figuring out what might be our next step ? Should be analyse the data more

The next step is: group the data by coin (e.g., BTC/USD) and by timestamp across exchanges and look at the dispersion. You can compute the difference between max and min, multiply by min volume, and sum over it the coin and time. This will give you an estimate of the money that can be made in the best case.

  1. I still can't figure out how to calculate the 'quote_refernce'.

Don't worry. One step at the time. Let's do it for BTC, Eth, etc where the reference value is the value of the coin vs USD itself.

Let me know if how I can improve more.

Thanks!!

DanilYachmenev commented 1 year ago

probably obsolete, moving to P1 for now