CLQ060728 / DCSSMS

The official DCSSMS repository
Apache License 2.0
0 stars 1 forks source link
deep-contrastive-self-supervised-learning imbalanced-classification-anlyses missing-data-handling-method-selection

Deep Contrastive Self-Supervised Method Selection Framework (DCSSMS) Of Missing Data Handling For Imbalanced Classification Analyses

The official repository for the Deep Contrastive Self-Supervised missing data handling Method Selection (DCSSMS) framework. The structure of the directories:

DCSSMS Embedding Training

To train the DCSSMS embedding network, follow the steps as shown below:

  1. clone the DCSSMS GitHub repository.
  2. in CLI, change current path to "DCSSMS/".
  3. use the following python script to start training:
    python ./Framework/MainBYOL.py --gpu_id [0] --data_dir "./DATA/Self-supervised_Training/" --init_lr [0.030280] --max_lr [1.287572] --batch_size [512] --num_layers [3] --out_sizes [256 512 1024] --outputdir ["specify your own directory"_] --usemomentum [True/False] [> "specify your own log file path"_ 2>&1 &]
    • "--gpu_id", specify the gpu id.
    • "--data_dir", specify the directory of the embedding training dataset.
    • "--init_lr", specify the initial learning rate for the OneCycle learning rate scheduler.
    • "--max-lr", specify the maximum learning rate for the OneCycle learning rate scheduler.
    • "--batch_size", specify the batch size for the DCSSMS embedding training.
    • "--num_layers", specify the number of layers for the Over-complete encoder network.
    • "--out_sizes", specify the sizes of the hidden layers for the Over-complete encoder network, e.g., 256 512 1024 for 3 layers, 256 512 1024 2048 for 4 layers Over-complete encoder network, etc.
    • "--output_dir", specify the output directory for the best learned embedding model.
    • "--use_momentum", specify whether to use the Stop-gradient mechanism for the "Target" network.
    • "[> "specify your own log file path" 2>&1 &]", specify whether to run the script in background, redirect stdout, stderr to log file, e.g., "> ./training512_10_3_true.log 2>&1 &".

DCSSMS Linear Evaluation Training

To fine-tune the DCSSMS embedding network according to the linear evaluation protocol, follow the steps as shown below:

  1. clone the DCSSMS GitHub repository, (if you have already cloned it, directly go to next step).
  2. in CLI, change current path to "DCSSMS/", (if you have already done it, directly go to next step).
  3. download the pretrained DCSSMS embedding model and put it into the "./Embedding/" folder.
  4. use the following python script to start fine-tuning:
    python ./Framework/LinearEvaluation.py --gpu_id [0] --data_dir "./DATA/Linear_Evaluation/" --embedding_dir ./Embedding/best_model_8192_10_3_True.pth --init_lr [0.03] --weight_decay [1e-4] --batch_size [128] --num_layers [3] --out_sizes [256 512 1024] --outputdir ["specify your own directory"] [> "specify your own log file path"_ 2>&1 &]
    • "--gpu_id", specify the gpu id.
    • "--data_dir", specify the directory of the fine-tuning dataset.
    • "--embedding_dir", specify the directory to store the best pre-trained embedding model.
    • "--init_lr", specify the constant learning rate for the fine-tuning.
    • "--weight_decay", specify the constant weight decay value to regulate the fine-tuning network.
    • "--batch_size", specify the batch size for the DCSSMS fine-tuning.
    • "--num_layers", specify the number of layers for the Over-complete encoder network (here using this option to keep consistent with the pre-trained embedding model).
    • "--out_sizes", specify the sizes of the hidden layers for the Over-complete encoder network, e.g., 256 512 1024 for 3 layers, 256 512 1024 2048 for 4 layers Over-complete encoder network, etc. (here using this option to keep consistent with the pre-trained embedding model)
    • "--output_dir", specify the output directory for the best fine-tuned model.
    • "[> "specify your own log file path" 2>&1 &]", specify whether to run the script in background, redirect stdout, stderr to log file, e.g., "> ./training512_10_3_true.log 2>&1 &".

Detailed Information About The 442 Investigated MDH Methods & The 20 Re-balancing Algorithms

MDH Method Selection Website For Each Of The 56 Real-world Imbalanced Classification Datasets

We are developing a website to facilitate the usage of our DCSSMS method selection framework, in which, we utilize our pre-trained embedding network to generate method selection embeddings. We adopt cosine similarity to compare the embedding of query method selection example with the embeddings of the learned method selection instances to recommend the reasonable MDH methods for the querying MDH scenarios (i.e., MDH under specified imbalanced classification analysis). Coming Soon ...

Requirements