Early Stopping Bugfix - Githubissues

RylieWeaver commented 11 months ago

Error in epoch array length

INFO:DrugCell:== Epoch [6/200] ==
INFO:DrugCell:   **** TRAINING ****   Epoch [7/200], loss: 0.04777. This took 483.5 secs.
INFO:DrugCell:   **** TEST ****   Epoch [7/200], loss: 0.02420. This took 497.8 secs.
0.6643735967013182
INFO:DrugCell:== Epoch [7/200] ==
INFO:DrugCell:   **** TRAINING ****   Epoch [8/200], loss: 0.04767. This took 482.7 secs.
INFO:DrugCell:   **** TEST ****   Epoch [8/200], loss: 0.02330. This took 497.3 secs.
0.6754340859988331
INFO:DrugCell:== Epoch [8/200] ==
INFO:DrugCell:   **** TRAINING ****   Epoch [9/200], loss: 0.04753. This took 484.0 secs.
INFO:DrugCell:   **** TEST ****   Epoch [9/200], loss: 0.02418. This took 499.4 secs.
0.6663326526477655
INFO:DrugCell:Early stopping after 8 epochs with no improvement.
Best performed model (epoch)    3
Traceback (most recent call last):
  File "/usr/local/DrugCell/train.py", line 167, in <module>
    candle_main()
  File "/usr/local/DrugCell/train.py", line 164, in candle_main
    run(params)
  File "/usr/local/DrugCell/train.py", line 155, in run
    scores = main(params)
  File "/usr/local/DrugCell/train_drugcell2.py", line 366, in main
    epoch_train_test_df['epoch'] = epoch_list
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 3612, in __setitem__
    self._set_item(key, value)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 3784, in _set_item
    value = self._sanitize_column(value)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 4509, in _sanitize_column
    com.require_length_match(value, self.index)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/common.py", line 532, in require_length_match
    "Length of values "
ValueError: Length of values (9) does not match length of index (200)

RylieWeaver commented 11 months ago

I think because of this for loop. It appends all the way up to the epoch hyperparameter number, which will usually be more than the early stopping number.

 for epoch in range(params['epochs']):
        model.train()
        epoch_list.append(epoch)
        train_predict =  torch.zeros(0,0).cuda(CUDA_ID)
        logger.info(f"== Epoch [{epoch}/{params['epochs']}] ==")
        train_loss_mean = 0
        t = time()

rohandavidg commented 11 months ago

i see i'll fix this

rohandavidg commented 11 months ago

added a fix for this as well.

rohandavidg commented 11 months ago

@RylieWeaver9 updated the code to get the last epoch

JDACS4C-IMPROVE / DrugCell

Early Stopping Bugfix #40