Closed ysyyork closed 1 year ago
Sorry for the inconvenience, we will try to update the datasheets ASAP.
For a brief answer, there are 2 kinds of features: one is features generated from the snapshot of the LOB and one is features generated from the aggregated trade information.
For features from bid1_price
to imblance_volume_oe
are the first kind of the rest is the second kind.
At timestamp t, we took the snapshot of the LOB and aggregate the trade happened from timestamp t -1 second to timestamp t and make open
, high
, low
, close
.
More specifically,
buy_volume_oe
is the sum of bidn_size
for n from 1 to 5.
sell_volume_oe
is the sum of askn_size
for n from 1 to 5.
bidm_size_n
is bidm_size /buy_volume_oe
for m from 1 to 5.
askm_size_n
is askm_size /sell_volume_oe
for m from 1 to 5.
buy_spread_oe
is bid1_price-bid5_price
sell_spread_oe
is ask5_price-ask1_price
imblance_volume_oe
is (buy_volume_oe-sell_volume_oe )/(buy_volume_oe+sell_volume_oe)
The rest features' definitions (the ones generated from the OHLC) could be found in the alpha158 provided by the qlib, the code is here.
Also, please star and fork our repo, your support is very important to us.
Thanks a lot for this detailed explanation!! this is helpful. I'll definitely start and fork!
Also, i think it would be super helpful if you can share the data transformation script in the codebase. This can enable everyone to modify their own data based on your example.
Sure, we will update our technical indicators ASAP.
@ysyyork Hi, wonder if you are still following up on this issue. This repo seems no longer under maintenance. I was trying to put my own data set into the model. Made my data structure the same as provided data, except for normalization(because can't know their normalization parameters). And the model is not acting right: at training stage, got around +- 30% Total return and -2k sharp ratio.
You may look into this repo: https://github.com/qinmoelei/EarnHFT. It contains the data sheet in the data preprocess and proper parameter setting training RL agents in Crypto.
I'm learning the LOB data but i can only find explanations on Kaggle which is very different from the actual data I saw in
high_frequency_trading
folder. For example,buy_volume_oe
,sell_volume_oe
. Is there a formal doc explaining all the columns in that sheet? Thanks