Is there any datasheet for the dataset in `high_frequency_trading`

ysyyork commented 1 year ago

I'm learning the LOB data but i can only find explanations on Kaggle which is very different from the actual data I saw in high_frequency_trading folder. For example, buy_volume_oe, sell_volume_oe. Is there a formal doc explaining all the columns in that sheet? Thanks

  | Unnamed: 0 | bid1_price | bid1_size | bid2_price | bid2_size | bid3_price | bid3_size | bid4_price | bid4_size | bid5_price | bid5_size | ask1_price | ask1_size | ask2_price | ask2_size | ask3_price | ask3_size | ask4_price | ask4_size | ask5_price | ask5_size | buy_volume_oe | sell_volume_oe | bid1_size_n | bid2_size_n | bid3_size_n | bid4_size_n | bid5_size_n | ask1_size_n | ask2_size_n | ask3_size_n | ask4_size_n | ask5_size_n | buy_spread_oe | sell_spread_oe | imblance_volume_oe | open | high | close | low | wap | trade_diff | trade_speard | kmid | klen | kmid2 | kup | kup2 | klow | klow2 | ksft | ksft2 | roc_10 | roc_30 | roc_60 | ma_10 | ma_30 | ma_60 | std_10 | std_30 | std_60 | beta_10 | beta_30 | beta_60 | max_10 | max_30 | max_60 | min_10 | min_30 | min_60 | qtlu_10 | qtlu_30 | qtlu_60 | qtld_10 | qtld_30 | qtld_60 | rsv_10 | rsv_30 | rsv_60 | imax_10 | imax_30 | imax_60 | imin_10 | imin_30 | imin_60 | imxd_10 | imxd_30 | imxd_60 | cntp_10 | cntp_30 | cntp_60 | cntn_10 | cntn_30 | cntn_60 | cntd_10 | cntd_30 | cntd_60
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

qinmoelei commented 1 year ago

Sorry for the inconvenience, we will try to update the datasheets ASAP. For a brief answer, there are 2 kinds of features: one is features generated from the snapshot of the LOB and one is features generated from the aggregated trade information. For features from bid1_price to imblance_volume_oe are the first kind of the rest is the second kind. At timestamp t, we took the snapshot of the LOB and aggregate the trade happened from timestamp t -1 second to timestamp t and make open, high, low, close. More specifically, buy_volume_oe is the sum of bidn_size for n from 1 to 5. sell_volume_oe is the sum of askn_size for n from 1 to 5. bidm_size_n is bidm_size /buy_volume_oe for m from 1 to 5. askm_size_n is askm_size /sell_volume_oe for m from 1 to 5. buy_spread_oe is bid1_price-bid5_price sell_spread_oe is ask5_price-ask1_price imblance_volume_oe is (buy_volume_oe-sell_volume_oe )/(buy_volume_oe+sell_volume_oe)

The rest features' definitions (the ones generated from the OHLC) could be found in the alpha158 provided by the qlib, the code is here.

Also, please star and fork our repo, your support is very important to us.

ysyyork commented 1 year ago

Thanks a lot for this detailed explanation!! this is helpful. I'll definitely start and fork!

Also, i think it would be super helpful if you can share the data transformation script in the codebase. This can enable everyone to modify their own data based on your example.

qinmoelei commented 1 year ago

Sure, we will update our technical indicators ASAP.

cpzz50 commented 4 months ago

@ysyyork Hi, wonder if you are still following up on this issue. This repo seems no longer under maintenance. I was trying to put my own data set into the model. Made my data structure the same as provided data, except for normalization(because can't know their normalization parameters). And the model is not acting right: at training stage, got around +- 30% Total return and -2k sharp ratio.

qinmoelei commented 3 months ago

You may look into this repo: https://github.com/qinmoelei/EarnHFT. It contains the data sheet in the data preprocess and proper parameter setting training RL agents in Crypto.

TradeMaster-NTU / TradeMaster

Is there any datasheet for the dataset in `high_frequency_trading` #145