SJTU-Quant / MASTER

This is the official code and supplementary materials for our AAAI-2024 paper: MASTER: Market-Guided Stock Transformer for Stock Price Forecasting. MASTER is a stock transformer for stock price forecasting, which models the momentary and cross-time stock correlation and guide feature selection with market information.
98 stars 20 forks source link

Dataset code #3

Closed w1nn1ethepooh closed 1 month ago

w1nn1ethepooh commented 2 months ago

Thank you for your fancy job!

I would like to ask if there is any source of code for generating the data sets dl_train, dl_valid and dl_test.

Have a nice day!

LITONG99 commented 1 month ago

Thank you for your attention to our work. We are not planning to publish the data infrastructure for explained reasons. If you need to process raw data, we highly recommend reusing the Qlib implementations. Here is the configuration:

infer_processors:

  • class: RobustZScoreNorm kwargs: fields_group: feature clip_outlier: true
  • class: Fillna kwargs: fields_group: feature learn_processors:
  • class: DropnaLabel
  • class: DropExtremeLabel kwargs: percentile: 0.975
  • class: CSZscoreNorm kwargs: fields_group: label

Please note that, except for DropExtremeLabel, the above configuration is used for many models in qlib/examples/benchmarks and we do use the Qlib implementations in producing the published dl_train, dl_valid, and dl_test. The DropExtremeLabel is implemented in our commercial codebase, which should be easy to implement in Qlib as well, since it obeys a simple rule to drop 2.5% of the highest/lowest labels.

ElonJustin7 commented 1 month ago

Hi, thank you for your outstanding work!

mask

I'd like to ask about the "Mask" in the information regarding market indices (such as 000300) in your dataset. What does it refer to? Thank you!

caozhiy commented 1 month ago

I think it is a Qlib data operator 'qlib.data.ops.Mask'. You can refer to https://qlib.readthedocs.io/en/latest/reference/api.html#module-qlib.data.ops for more details.

Hi, thank you for your outstanding work! mask I'd like to ask about the "Mask" in the information regarding market indices (such as 000300) in your dataset. What does it refer to? Thank you!