Raschka-research-group / coral-cnn

Rank Consistent Ordinal Regression for Neural Networks with Application to Age Estimation
https://www.sciencedirect.com/science/article/pii/S016786552030413X
MIT License
335 stars 62 forks source link

While traning my model i'm facing issue. #19

Closed NarasimmanSaravana1994 closed 1 year ago

NarasimmanSaravana1994 commented 4 years ago

Epoch: 001/200 | Batch 0000/20149 | Cost: 70.1415 Epoch: 001/200 | Batch 0050/20149 | Cost: 59.7190 Epoch: 001/200 | Batch 0100/20149 | Cost: 56.4751 Epoch: 001/200 | Batch 0150/20149 | Cost: 58.4821 Epoch: 001/200 | Batch 0200/20149 | Cost: 56.8452 Epoch: 001/200 | Batch 0250/20149 | Cost: 59.0936 Epoch: 001/200 | Batch 0300/20149 | Cost: 54.9184 Epoch: 001/200 | Batch 0350/20149 | Cost: 53.4635 Epoch: 001/200 | Batch 0400/20149 | Cost: 52.2409 Epoch: 001/200 | Batch 0450/20149 | Cost: 51.1332 Epoch: 001/200 | Batch 0500/20149 | Cost: 57.5054 Epoch: 001/200 | Batch 0550/20149 | Cost: 53.7109 Epoch: 001/200 | Batch 0600/20149 | Cost: 58.1618 Epoch: 001/200 | Batch 0650/20149 | Cost: 53.6513 Epoch: 001/200 | Batch 0700/20149 | Cost: 55.9161 Epoch: 001/200 | Batch 0750/20149 | Cost: 55.2700 Epoch: 001/200 | Batch 0800/20149 | Cost: 52.1431 Epoch: 001/200 | Batch 0850/20149 | Cost: 54.5851 Epoch: 001/200 | Batch 0900/20149 | Cost: 62.3357 Epoch: 001/200 | Batch 0950/20149 | Cost: 53.9224 Epoch: 001/200 | Batch 1000/20149 | Cost: 57.4987 Epoch: 001/200 | Batch 1050/20149 | Cost: 59.1612 Epoch: 001/200 | Batch 1100/20149 | Cost: 52.0190 Epoch: 001/200 | Batch 1150/20149 | Cost: 59.5060 Epoch: 001/200 | Batch 1200/20149 | Cost: 57.0917 Epoch: 001/200 | Batch 1250/20149 | Cost: 53.7502 Epoch: 001/200 | Batch 1300/20149 | Cost: 62.6665 Epoch: 001/200 | Batch 1350/20149 | Cost: 50.6539 Epoch: 001/200 | Batch 1400/20149 | Cost: 51.1941 Traceback (most recent call last): File "afad-coral.py", line 379, in for batch_idx, (features, targets, levels) in enumerate(train_loader): File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in default_collate return [default_collate(samples) for samples in transposed] File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in return [default_collate(samples) for samples in transposed] File "/home/administrator/gender_identification/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 98 and 99 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

While training my custom dataset model I'm facing the issue . Is there any way to identify the issue file >

@rasbt @yienxu Please guide me.

rasbt commented 4 years ago

I don't think this issue is related to CORAL; you may have some issues with your dataset because it cannot complete the first epoch. I suggest you iterate over your custom dataset and check the tensor sizes of features and targets to see if they are inconsistent somewhere.

ahm7 commented 4 years ago

I am facing the same error did you solve it ?

Shubhammawa commented 4 years ago

I think you need to consider the fact that the ages in your dataset probably aren't starting from 0. Thus when you create the levels maybe try using : label = self.age[index] - k levels = [1]label + [0](NUM_CLASSES - 1 - label) where k is the minimum age in your dataset. Otherwise the NUM_CLASSES - 1 - label quantity becomes negative for the last k ages and you end up having mismatched dimensional vectors/tensors. I was facing the same error and this was the issue. I am not sure if you're having the same issue but you can try this.

rasbt commented 4 years ago

Good point, there was a related issue here #22

I.e., make sure that the labels start at 0 by subtracting "min(age)" from all labels during training. Then, to make predictions, just add "min(age)" back to the predicted label.

For example, if you have ages between 20-50, subtract "20" from all training examples. Then, if you predict on new data and the model predicts 5, then the "real" label is 5+20 = 25.

I should note that having labels starting at 0 is not only a requirement for CORAL but for regular classification (cross entropy loss) as well -- here, it's due to how PyTorch internally considers the one-hot targets of the class labels when computing the cross entropy loss.

NarasimmanSaravana1994 commented 1 year ago

The issue was resolved while I updated the code in my locally.....