Open lwhkop opened 4 years ago
Hi, first off the code is not maintained and may contain errors but I will see what I can do. Do you mind telling me what version of pytorch you are using? The issue seems to be within the source code with pytorch which may be due to incompatibility of versions. Is it possible to give which line in train_ssd_lite.py is hitting this error?
Hi, first off the code is not maintained and may contain errors but I will see what I can do. Do you mind telling me what version of pytorch you are using? The issue seems to be within the source code with pytorch which may be due to incompatibility of versions. Is it possible to give which line in train_ssd_lite.py is hitting this error?
Thanks for your reply. My torch version is 1.3.1. When I use the transform function, the error happens in vision/transforms/transforms.py. line 13, line 99, line 343 and so on in transforms.py meet the same problem
Ok, so the version I used was pytorch 1.2.0 so I would first try to make an environment with this setting and then retry the command! Also this line of code seems to be replacing the upmost bounding box point. I would check to see if the dataset is being read in correctly. If there is no data then there cannot be any bounding boxes which will result in the error here.
Ok, so the version I used was pytorch 1.2.0 so I would first try to make an environment with this setting and then retry the command! Also this line of code seems to be replacing the upmost bounding box point. I would check to see if the dataset is being read in correctly. If there is no data then there cannot be any bounding boxes which will result in the error here.
And if I comment out the code of transform in deepfashion2_Dataset.py line77-80, line 86-87, I will get the error like this:
Traceback (most recent call last):
File "train_ssd_lite.py", line 320, in
so I don't know how to fix it...
And if I comment out the code of transform in deepfashion2_Dataset.py line77-80, line 86-87, I will get the error like this: Traceback (most recent call last): File "train_ssd_lite.py", line 320, in device=DEVICE, debug_steps=args.debug_steps, epoch=epoch) File "train_ssd_lite.py", line 108, in train for i, data in enumerate(loader): File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch return self.collate_fn(data) File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 79, in default_collate return [default_collate(samples) for samples in transposed] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 79, in return [default_collate(samples) for samples in transposed] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 64, in default_collate return default_collate([torch.as_tensor(b) for b in batch]) File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1173 and 468 in dimension 1 at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensor.cpp:689
so I don't know how to fix it...
These lines are needed since they preprocess the images so they are valid inputs for the network. Can you verify the dataset is being loaded properly? And then uncomment the lines you commented before
And if I comment out the code of transform in deepfashion2_Dataset.py line77-80, line 86-87, I will get the error like this: Traceback (most recent call last): File "train_ssd_lite.py", line 320, in device=DEVICE, debug_steps=args.debug_steps, epoch=epoch) File "train_ssd_lite.py", line 108, in train for i, data in enumerate(loader): File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch return self.collate_fn(data) File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 79, in default_collate return [default_collate(samples) for samples in transposed] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 79, in return [default_collate(samples) for samples in transposed] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 64, in default_collate return default_collate([torch.as_tensor(b) for b in batch]) File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1173 and 468 in dimension 1 at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensor.cpp:689 so I don't know how to fix it...
These lines are needed since they preprocess the images so they are valid inputs for the network. Can you verify the dataset is being loaded properly? And then uncomment the lines you commented before
E:\DeepFashion2 Dataset>python train_ssd_lite.py --dataset_type deep_fashion_2 --datasets ./ --validation_dataset ./ --net mb2-ssd-lite --scheduler cosine --lr 0.01 --t_max 200 --validation_epochs 5 --num_epochs 20
2020-01-04 18:24:33,038 - root - INFO - Use Cuda.
2020-01-04 18:24:33,038 - root - INFO - Namespace(balance_data=False, base_net=None, base_net_lr=None, batch_size=32, checkpoint_folder='./', dataset_type='deep_fashion_2', datasets=['./'], debug_steps=100, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb2-ssd-lite', num_epochs=20, num_workers=0, pretrained_ssd=None, resume=None, scheduler='cosine', t_max=200.0, use_cuda=True, validation_dataset='./', validation_epochs=5, weight_decay=0.0005)
2020-01-04 18:24:33,039 - root - INFO - Prepare training datasets.
2020-01-04 18:24:33,083 - root - INFO - No labels file, using default VOC classes.
2020-01-04 18:24:33,083 - root - INFO - Stored labels into file ./deepfashion2-labels.txt.
2020-01-04 18:24:33,083 - root - INFO - Train dataset size: 191961
2020-01-04 18:24:33,084 - root - INFO - Prepare Validation datasets.
2020-01-04 18:24:33,091 - root - INFO - No labels file, using default VOC classes.
2020-01-04 18:24:33,091 - root - INFO - <vision.datasets.deepfashion2_Dataset.DeepFashion2Dataset object at 0x000001ED89E522C8>
2020-01-04 18:24:33,091 - root - INFO - validation dataset size: 32153
2020-01-04 18:24:33,091 - root - INFO - Build network.
2020-01-04 18:24:33,158 - root - INFO - Took 0.00 seconds to load the model.
2020-01-04 18:24:34,965 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.01, Extra Layers learning rate: 0.01.
2020-01-04 18:24:34,965 - root - INFO - Uses CosineAnnealingLR scheduler.
2020-01-04 18:24:34,965 - root - INFO - Start training from epoch 0.
C:\Users\ASUS\Anaconda3\lib\site-packages\torch\optim\lr_scheduler.py:100: UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "train_ssd_lite.py", line 320, in
Error happens after Start training, does it mean that my dataset is loaded? I'm sorry that I'm new to this.
E:\DeepFashion2 Dataset>python train_ssd_lite.py --dataset_type deep_fashion_2 --datasets ./ --validation_dataset ./ --net mb2-ssd-lite --scheduler cosine --lr 0.01 --t_max 200 --validation_epochs 5 --num_epochs 20 2020-01-04 18:24:33,038 - root - INFO - Use Cuda. 2020-01-04 18:24:33,038 - root - INFO - Namespace(balance_data=False, base_net=None, base_net_lr=None, batch_size=32, checkpoint_folder='./', dataset_type='deep_fashion_2', datasets=['./'], debug_steps=100, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb2-ssd-lite', num_epochs=20, num_workers=0, pretrained_ssd=None, resume=None, scheduler='cosine', t_max=200.0, use_cuda=True, validation_dataset='./', validation_epochs=5, weight_decay=0.0005) 2020-01-04 18:24:33,039 - root - INFO - Prepare training datasets. 2020-01-04 18:24:33,083 - root - INFO - No labels file, using default VOC classes. 2020-01-04 18:24:33,083 - root - INFO - Stored labels into file ./deepfashion2-labels.txt. 2020-01-04 18:24:33,083 - root - INFO - Train dataset size: 191961 2020-01-04 18:24:33,084 - root - INFO - Prepare Validation datasets. 2020-01-04 18:24:33,091 - root - INFO - No labels file, using default VOC classes. 2020-01-04 18:24:33,091 - root - INFO - <vision.datasets.deepfashion2_Dataset.DeepFashion2Dataset object at 0x000001ED89E522C8> 2020-01-04 18:24:33,091 - root - INFO - validation dataset size: 32153 2020-01-04 18:24:33,091 - root - INFO - Build network. 2020-01-04 18:24:33,158 - root - INFO - Took 0.00 seconds to load the model. 2020-01-04 18:24:34,965 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.01, Extra Layers learning rate: 0.01. 2020-01-04 18:24:34,965 - root - INFO - Uses CosineAnnealingLR scheduler. 2020-01-04 18:24:34,965 - root - INFO - Start training from epoch 0. C:\Users\ASUS\Anaconda3\lib\site-packages\torch\optim\lr_scheduler.py💯 UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) Traceback (most recent call last): File "train_ssd_lite.py", line 320, in device=DEVICE, debug_steps=args.debug_steps, epoch=epoch) File "train_ssd_lite.py", line 108, in train for i, data in enumerate(loader): File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data\dataset.py", line 207, in getitem return self.datasets[dataset_idx][sample_idx] File "E:\DeepFashion2 Dataset\vision\datasets\deepfashion2_Dataset.py", line 78, in getitem image, boxes, labels = self.transform(image, boxes, labels) File "E:\DeepFashion2 Dataset\vision\ssd\data_preprocessing.py", line 34, in call return self.augment(img, boxes, labels) File "E:\DeepFashion2 Dataset\vision\transforms\transforms.py", line 55, in call img, boxes, labels = t(img, boxes, labels) File "E:\DeepFashion2 Dataset\vision\transforms\transforms.py", line 343, in call boxes[:, :2] += (int(left), int(top)) IndexError: too many indices for arrayError happens after Start training, does it mean that my dataset is loaded? I'm sorry that I'm new to this.
It might be possible that the dataset is not loaded. Are you attempting to train with the voc dataset? Because the original training was intended for this dataset but was later switched to deepfashion. You can check the dataset image ids using one of the methods in the Deepfashion class.
Dont worry , this is not your fault at all. This is my issue with not maintaining this repo well since I ran out of time to properly implement the entire network. I am more than willing to help get it running though (time permitting haha)!
E:\DeepFashion2 Dataset>python train_ssd_lite.py --dataset_type deep_fashion_2 --datasets ./ --validation_dataset ./ --net mb2-ssd-lite --scheduler cosine --lr 0.01 --t_max 200 --validation_epochs 5 --num_epochs 20 2020-01-04 18:24:33,038 - root - INFO - Use Cuda. 2020-01-04 18:24:33,038 - root - INFO - Namespace(balance_data=False, base_net=None, base_net_lr=None, batch_size=32, checkpoint_folder='./', dataset_type='deep_fashion_2', datasets=['./'], debug_steps=100, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb2-ssd-lite', num_epochs=20, num_workers=0, pretrained_ssd=None, resume=None, scheduler='cosine', t_max=200.0, use_cuda=True, validation_dataset='./', validation_epochs=5, weight_decay=0.0005) 2020-01-04 18:24:33,039 - root - INFO - Prepare training datasets. 2020-01-04 18:24:33,083 - root - INFO - No labels file, using default VOC classes. 2020-01-04 18:24:33,083 - root - INFO - Stored labels into file ./deepfashion2-labels.txt. 2020-01-04 18:24:33,083 - root - INFO - Train dataset size: 191961 2020-01-04 18:24:33,084 - root - INFO - Prepare Validation datasets. 2020-01-04 18:24:33,091 - root - INFO - No labels file, using default VOC classes. 2020-01-04 18:24:33,091 - root - INFO - <vision.datasets.deepfashion2_Dataset.DeepFashion2Dataset object at 0x000001ED89E522C8> 2020-01-04 18:24:33,091 - root - INFO - validation dataset size: 32153 2020-01-04 18:24:33,091 - root - INFO - Build network. 2020-01-04 18:24:33,158 - root - INFO - Took 0.00 seconds to load the model. 2020-01-04 18:24:34,965 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.01, Extra Layers learning rate: 0.01. 2020-01-04 18:24:34,965 - root - INFO - Uses CosineAnnealingLR scheduler. 2020-01-04 18:24:34,965 - root - INFO - Start training from epoch 0. C:\Users\ASUS\Anaconda3\lib\site-packages\torch\optim\lr_scheduler.py💯 UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) Traceback (most recent call last): File "train_ssd_lite.py", line 320, in device=DEVICE, debug_steps=args.debug_steps, epoch=epoch) File "train_ssd_lite.py", line 108, in train for i, data in enumerate(loader): File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\ASUS\Anaconda3\lib\site-packages\torch\utils\data\dataset.py", line 207, in getitem return self.datasets[dataset_idx][sample_idx] File "E:\DeepFashion2 Dataset\vision\datasets\deepfashion2_Dataset.py", line 78, in getitem image, boxes, labels = self.transform(image, boxes, labels) File "E:\DeepFashion2 Dataset\vision\ssd\data_preprocessing.py", line 34, in call return self.augment(img, boxes, labels) File "E:\DeepFashion2 Dataset\vision\transforms\transforms.py", line 55, in call img, boxes, labels = t(img, boxes, labels) File "E:\DeepFashion2 Dataset\vision\transforms\transforms.py", line 343, in call boxes[:, :2] += (int(left), int(top)) IndexError: too many indices for array Error happens after Start training, does it mean that my dataset is loaded? I'm sorry that I'm new to this.It might be possible that the dataset is not loaded. Are you attempting to train with the voc dataset? Because the original training was intended for this dataset but was later switched to deepfashion. You can check the dataset image ids using one of the methods in the Deepfashion class.
Dont worry , this is not your fault at all. This is my issue with not maintaining this repo well since I ran out of time to properly implement the entire network. I am more than willing to help get it running though (time permitting haha)!
I have trained a voc dataset with mobilenet last month, but Deepfashion2 is the coco type dataset, I don't know how to ues it to train the ssd task, so I just follow your step in README. Exactly I haven't used coco dataset before. Do you mind using some spare time to tell me how to check the loaded dataset and what could I do if the dataset haven't been loaded properly?
I have trained a voc dataset with mobilenet last month, but Deepfashion2 is the coco type dataset, I don't know how to ues it to train the ssd task, so I just follow your step in README. Exactly I haven't used coco dataset before. Do you mind using some spare time to tell me how to check the loaded dataset and what could I do if the dataset haven't been loaded properly?
elif args.dataset_type == 'deep_fashion_2':
## STORES LABEL IN FILE FOR NETWORK TO READ
label_file = os.path.join(args.checkpoint_folder,"deepfashion2-labels.txt")
dataset = DeepFashionDataset(dataset_path, transform=train_transform,
target_transform=target_transform)
store_labels(label_file,dataset.class_names)
num_classes = len(dataset.class_names)
You can see what the number of classes is after this line at line ~180 in the main script. This should be only 3 classes since I was trying to reduce to problem to a reasonable subset.
File "E:\DeepFashion2 Dataset\vision\transforms\transforms.py", line 343, in call boxes[:, :2] += (int(left), int(top)) IndexError: too many indices for array
When I run train_ssd_lite.py, I meet this problem?How to solve it?