bioinfomaticsCSU / deepsignal

Detecting methylation using signal-level features from Nanopore sequencing reads
GNU General Public License v3.0
109 stars 21 forks source link

Training model with more than 2 classes #64

Closed pterzian closed 3 years ago

pterzian commented 3 years ago

Hi @PengNi ,

I am trying to build a model with 4 classes (4 types of features with 0,1,2or 3as methy_label)

## parameters:
train_file:
        4_classes_train_20M_cleaned_shuff.tsv
valid_file:
        4_classes_valid_12k_shuff.tsv
is_binary:
        no
model_dir:
        models_4classes/
log_dir:
        logs_4classes/
is_cnn:
        yes
is_base:
        yes
is_rnn:
        yes
kmer_len:
        17
cent_signals_len:
        360
batch_size:
        512
learning_rate:
        0.001
decay_rate:
        0.1
class_num:
        4
keep_prob:
        0.5
max_epoch_num:
        10
min_epoch_num:
        5
display_step:
        100
pos_weight:
        1.0

Unfortunately I get this error message :

Traceback (most recent call last):
  File "/home/pterzian/venv-deepsignal/bin/deepsignal", line 8, in <module>
    sys.exit(main())
  File "/home/pterzian/venv-deepsignal/lib/python3.6/site-packages/deepsignal/deepsignal.py", line 423, in main
    args.func(args)
  File "/home/pterzian/venv-deepsignal/lib/python3.6/site-packages/deepsignal/deepsignal.py", line 120, in main_train
    min_epoch_num, display_step, pos_weight, is_binary, is_rnn, is_base, is_cnn)
  File "/home/pterzian/venv-deepsignal/lib/python3.6/site-packages/deepsignal/train_model.py", line 166, in train
    y_true=b_label, y_pred=train_prediction)
  File "/home/pterzian/venv-deepsignal/lib/python3.6/site-packages/sklearn/metrics/_classification.py", line 1789, in recall_score
    zero_division=zero_division)
  File "/home/pterzian/venv-deepsignal/lib/python3.6/site-packages/sklearn/metrics/_classification.py", line 1484, in precision_recall_fscore_support
    pos_label)
  File "/home/pterzian/venv-deepsignal/lib/python3.6/site-packages/sklearn/metrics/_classification.py", line 1316, in _check_set_wise_labels
    % (y_type, average_options))
ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

Maybe it needs to add the average parameter for all metrics of sklearn ?

such as here : https://github.com/bioinfomaticsCSU/deepsignal/blob/b488b5ba9b77d1c99707071b2b6e9341e2b6f82a/deepsignal/train_model.py#L163-L168

Best wishes for 2021 ! Paul

PengNi commented 3 years ago

Hi Paul,

I have changed the average param of metrics "recall" and "precision". Please check commits 8f7d78 in develop branch to test if there are any more problems. Note that we didn't test DeepSignal for multi label classification. Somewhere else in the code may need to be changed, for example a more suitable loss function. I wish that you can train a good model!

Best, Peng