Closed w1nn1ethepooh closed 1 month ago
Thank you for your attention to our work. We are not planning to publish the data infrastructure for explained reasons. If you need to process raw data, we highly recommend reusing the Qlib implementations. Here is the configuration:
infer_processors:
- class: RobustZScoreNorm kwargs: fields_group: feature clip_outlier: true
- class: Fillna kwargs: fields_group: feature learn_processors:
- class: DropnaLabel
- class: DropExtremeLabel kwargs: percentile: 0.975
- class: CSZscoreNorm kwargs: fields_group: label
Please note that, except for DropExtremeLabel
, the above configuration is used for many models in qlib/examples/benchmarks
and we do use the Qlib implementations in producing the published dl_train, dl_valid, and dl_test. The DropExtremeLabel
is implemented in our commercial codebase, which should be easy to implement in Qlib as well, since it obeys a simple rule to drop 2.5% of the highest/lowest labels.
Hi, thank you for your outstanding work!
I'd like to ask about the "Mask" in the information regarding market indices (such as 000300) in your dataset. What does it refer to? Thank you!
I think it is a Qlib data operator 'qlib.data.ops.Mask'. You can refer to https://qlib.readthedocs.io/en/latest/reference/api.html#module-qlib.data.ops for more details.
Hi, thank you for your outstanding work! I'd like to ask about the "Mask" in the information regarding market indices (such as 000300) in your dataset. What does it refer to? Thank you!
Thank you for your fancy job!
I would like to ask if there is any source of code for generating the data sets dl_train, dl_valid and dl_test.
Have a nice day!