Jeonghwan-Cheon / lob-deep-learning

Implementation of various deep learning models for limit order book. DeepLOB (Zhang et al., 2018), TransLOB (Wallbridge, 2020), DeepFolio (Sangadiev et al., 2020), etc.
88 stars 22 forks source link
convolutional-neural-networks deep-learning financial-engineering high-frequency-trading limit-order-book lstm market-microstructure transformer

LOBster

LOBster is a project entitled <Limit order book (LOB) driven simultaneous time-series estimation in real-market-microstructure>, which is end-to-end machine learning pipeline to predict future mid-price using limit order book. Our project provides a source code of the machine learning pipeline that contains data processing, model training and inference. It contains an implementation of DeepLOB (Zhang, 2018) and our modified model.
We also provide an implementation of handling code for FI-2010 (Ntakaris et al., 2017), a publicly available benchmark dataset for mid-price forecasting for limit order book data. In addition, we provide a pre-processing tools for custom raw LOB dataset collected in real market microstructure. The pre-processing tool contains several useful functions, such as down-sampling, normalization and labeling.
Lastly, our project provides some modules that test the classification performance of trained model. Specially, it contains a simple market simulator that test whether inference of model works in real market microstructure. It tests the trading performance (i.e. cumulative profits) based of inference on the test set.

Problem and challenge

Importance of estimate the order flow

Difficulty

Limit order book (LOB)

Limit order book

Dataset

FI-2010

Models

Model architecture

DeepLOB (Zhang, 2018)

LOBster (Our model)

Hierarchical feature integration

Hierarchical feature integration

Guideline

Setup

  1. Download the FI-2010 dataset (FI-2010) and unzip it on the project folder.
  2. Install the NVIDIA toolkit for GPU support (CUDA 11.0, CUDNN 8)
  3. Check the CUDA is available
    nvidia-smi
  4. Install the dependency libraries
    pip install requirements.txt

Running experiments

  1. Hyperparameter setting
    Open the optimizers/hyperparams.yaml to modify the hyperparameter setting. You can set the batch size, learning rate, epsilon, maximum epoch and number of workers to load dataset. Otherwise, the experiments will conduct under our fine-tuned hyperparameters.
    [model name]:
     batch_size: 128
     learning_rate: 0.0001
     epsilon: 1e-08
     epoch: 30
     num_workers: 4
  2. Experiment setting
    Open the main.py to set the experiment parameters. Our base experiment setting already implements in the main.py, so you don't have to modify it.

    # experiment parameter setting
    dataset_type = 'fi2010'
    normalization = 'Zscore'
    model_type = 'lobster'
    lighten = True
    
    T = 100
    k = 4
    stock = [0, 1, 2, 3, 4]
    train_test_ratio = 0.7
    • dataset_type: Dataset for experiment. You can select 'fi2010' or 'krx'. (only 'fi2010' is available in public demo version)
    • normalization: Normalization method. 'Zscore', 'MinMax', and 'DecPre' are available.
    • lighten: It determines whether the experiment uses the 10-level LOB data or 5-level reduced LOB data. If lighten is True, experiment will only use the 5-level reduced data. This parameter affects not only the input dataset, but also the architecture of the model.
    • model_type: Model used in experiment. 'deeplob' and 'lobster' is available.
    • T: Length of time window used in single input. T = 100 used in paper and our experiment.
    • k: Prediction horizion. For fi-2010, 0, 1, 2, 3, 4 is available, which indicates the 10, 20, 30, 50, 100 ticks of horizon. For krx, any prediction horizon is available.
    • stock: Stock dataset used in experiment. For FI-2010, [0, 1, 2, 3, 4] are available, which indicates corresponding individual stocks. For KRX, ['KS200', 'KQ150'] are available. You can use multi-stocks for single experiments.
    • train_test_ratio: Ratio to split the training set and test set. For example, if the train_test_ratio is 0.7, the early 0.7 days data are used for training set and the late 0.3 days data are used for test set.
  3. Run the main.py
    python main.py
  4. Check the experiment result
    When you run the main.py, it will automatically generate a unique ID for each experiment and print it. It includes some information for experiment, such as model type and experiment datetime (ex. lobster-lighten_2022-12-03_10:34:05). The trained model and all the corresponding result will save in loggers/results/[model id].
  5. Evaluate the model
    The implemented code will automatically give the visualization of training process, classification reports (confusion matrix, accuracy, precision, recall, f1-score) and market simulation result. Note that market simulation is not available on FI-2010 dataset, since it only provides the normalized price data. Or, if you want to re-generate the above evaluation, you can run the code with the corresponding model id.

References

Our reference papers are listed in References.md.