c4dm / dcase-few-shot-bioacoustic

MIT License
48 stars 36 forks source link

something wrong when I run the code #11

Closed yangdongchao closed 3 years ago

yangdongchao commented 3 years ago

(base) ydc@HR:~/DACSE2021/task5/dcase-few-shot/baselines/deep_learning$ sh runme.sh Epoch 0 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 828/828 [00:25<00:00, 32.40it/s] Average train loss: 1.9980836902848549 Average training accuracy: 0.7995893684562277 0%| | 0/276 [00:00<?, ?it/s]

when I run the code, it stop on validation stage, val_iterator = iter(valid_loader) for batch in tqdm(val_iterator): x,y = batch x = x.to(device) x_val = model(x) valid_loss, valid_acc = loss_fn(x_val, y, conf.train.n_shot) val_loss.append(valid_loss.item()) val_acc.append(valid_acc.item()) avg_loss_vd = np.mean(val_loss[-num_batches_vd:]) avg_acc_vd = np.mean(val_acc[-num_batches_vd:]) it seems that the val_iterator can not generate validate data, so the process stop at this stage.

shubhrsingh22 commented 3 years ago

I am not able to replicate the issue on my side. The validation stage is executing. Could you let me know if you changed something in the code so that I can make the same changes and try to replicate the issue?

yangdongchao commented 3 years ago

thanks for your reply. I have solve this problem by modify the num_worker of train_loader and valid_loader as 1, your early version num_worker is 8. This problem is caused by multithreading, I don not know why my machine occurs this problem. train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_sampler=samplr_train,num_workers=1,pin_memory=True,shuffle=False) valid_loader = torch.utils.data.DataLoader(dataset=valid_dataset,batch_sampler=samplr_valid,num_workers=1,pin_memory=True,shuffle=False) 签名由网易邮箱大师定制 On 3/10/2021 01:10,Shubhrnotifications@github.com wrote:

I am not able to replicate the issue on my side. The validation code is running. Could you let me know if you changed something in the code so that I can make the same changes and try to replicate the issue?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

veronicamorfi commented 3 years ago

@yangdongchao thanks for letting us know of this issue, it might be others have the same problem.

dby124 commented 3 years ago

I have the same problem. Although I modified the num_worker of train_loader and valid_loader as 1, my machine also occurs this problem.

The information of running code:

Epoch 0 1%|▌ | 6/828 [00:00<01:11, 11.52it/s ...... 100%|███████████████████████████████████████████████████████| 828/828 [00:16<00:00, 49.01it/s] Average train loss: 2.1956764483725393 Average training accuracy: 0.8075603820779473 4%|███▏ | 11/276 [00:00<00:02, 105.90i ...... 100%|█████████████████████████████████████████████████████████ | 276/276 [00:02<00:00, 110.27it/s] Epoch 0, Validation loss 0.4364, Validation accuracy 0.8799 Saving the best model with valdation accuracy 0.8799275317485782 Epoch 1 0%| | 0/828 [00:00<?, ?it/s]

yangdongchao commented 3 years ago

The best way to solve this problem is delete the num_worker and pin_memory.

15087581161

@. | 签名由网易邮箱大师定制 On 3/17/2021 @.> wrote:

I have the same problem. Although I modified the num_worker of train_loader and valid_loader as 1, my machine also occurs this problem.

The information of running code:

Epoch 0 1%|▌ | 6/828 [00:00<01:11, 11.52it/s ...... 100%|███████████████████████████████████████████████████████| 828/828 [00:16<00:00, 49.01it/s] Average train loss: 2.1956764483725393 Average training accuracy: 0.8075603820779473 4%|███▏ | 11/276 [00:00<00:02, 105.90i ...... 100%|█████████████████████████████████████████████████████████ | 276/276 [00:02<00:00, 110.27it/s] Epoch 0, Validation loss 0.4364, Validation accuracy 0.8799 Saving the best model with valdation accuracy 0.8799275317485782 Epoch 1 0%| | 0/828 [00:00<?, ?it/s]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

dby124 commented 3 years ago

I have solved this problem by modifing the num_worker of train_loader and valid_loader as 0.

ntts3 commented 1 year ago

Hi, how can I get "Mel_train.h5" please?