Closed Johnson-Su closed 4 years ago
I have a similar problem. Have you found solution?
Mine is this:
2020-09-21 23:17:53 - Using CUDA...
2020-09-21 23:17:53 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/wood', dataset_type='voc', datasets=['data/wood'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=5, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2020-09-21 23:17:53 - Prepare training datasets.
2020-09-21 23:17:53 - No labels file, using default VOC classes.
2020-09-21 23:17:53 - Stored labels into file models/wood/labels.txt.
2020-09-21 23:17:53 - Train dataset size: 50
2020-09-21 23:17:53 - Prepare Validation datasets.
2020-09-21 23:17:53 - No labels file, using default VOC classes.
2020-09-21 23:17:53 - Validation dataset size: 10
2020-09-21 23:17:53 - Build network.
2020-09-21 23:17:53 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2020-09-21 23:17:53 - Took 0.15 seconds to load the model.
2020-09-21 23:17:56 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2020-09-21 23:17:56 - Uses CosineAnnealingLR scheduler.
2020-09-21 23:17:56 - Start training from epoch 0.
/home/nvidia/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "train_ssd.py", line 343, in
I think you have an XML annotation file that has no objects / boxes defined in it.
I added a check for this to the repo several weeks back - when did you clone? You might want to update the ssd submodule repo in your tree.
That was not my problem, I think. I have the latest version, downloaded it yesterday. My problems were solved by adding the the VOC labels file, that the classes are read from. Everything works now, but thank you for the response! :)
I cloned a while back so that may be the issue. I'll update that submodule for sure!
I think you have an XML annotation file that has no objects / boxes defined in it.
I added a check for this to the repo several weeks back - when did you clone? You might want to update the ssd submodule repo in your tree.
python3 train_ssd.py --dataset_type voc --datasets data --net mb1-ssd --scheduler cosine --lr 0.01 --t_max 100 --validation_epochs 1 --num_epochs 100 --base_net_lr 0.2 --batch_size 5
2020-10-06 08:58:45,398 - root - INFO - Namespace(balance_data=False, base_net=None, base_net_lr=0.2, batch_size=5, checkpoint_folder='models/', dataset_type='voc', datasets=['data'], debug_steps=100, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=100, num_workers=4, pretrained_ssd=None, resume=None, scheduler='cosine', t_max=100.0, use_cuda=True, validation_dataset=None, validation_epochs=1, weight_decay=0.0005)
2020-10-06 08:58:45,399 - root - INFO - Prepare training datasets.
2020-10-06 08:58:45,399 - root - INFO - VOC Labels read from file: ('BACKGROUND', 'personvotingspeaking')
2020-10-06 08:58:45,400 - root - INFO - Stored labels into file models/voc-model-labels.txt.
2020-10-06 08:58:45,400 - root - INFO - Train dataset size: 970
2020-10-06 08:58:45,400 - root - INFO - Prepare Validation datasets.
2020-10-06 08:58:45,400 - root - INFO - VOC Labels read from file: ('BACKGROUND', 'personvotingspeaking')
2020-10-06 08:58:45,400 - root - INFO - validation dataset size: 415
2020-10-06 08:58:45,400 - root - INFO - Build network.
2020-10-06 08:58:45,472 - root - INFO - Took 0.00 seconds to load the model.
2020-10-06 08:58:45,474 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.2, Extra Layers learning rate: 0.01.
2020-10-06 08:58:45,474 - root - INFO - Uses CosineAnnealingLR scheduler.
2020-10-06 08:58:45,474 - root - INFO - Start training from epoch 0.
/home/linagora/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:122: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "train_ssd.py", line 319, in <module>
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 115, in train
for i, data in enumerate(loader):
File "/home/linagora/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/linagora/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/linagora/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/linagora/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/linagora/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/linagora/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/linagora/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/linagora/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 207, in __getitem__
return self.datasets[dataset_idx][sample_idx]
File "/home/linagora/Desktop/pytorch-test/gest/vision/datasets/voc_dataset.py", line 66, in __getitem__
image, boxes, labels = self.transform(image, boxes, labels)
File "/home/linagora/Desktop/pytorch-test/gest/vision/ssd/data_preprocessing.py", line 34, in __call__
return self.augment(img, boxes, labels)
File "/home/linagora/Desktop/pytorch-test/gest/vision/transforms/transforms.py", line 55, in __call__
img, boxes, labels = t(img, boxes, labels)
File "/home/linagora/Desktop/pytorch-test/gest/vision/transforms/transforms.py", line 343, in __call__
boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array
* i have the same error
@Mohamed1991A , do you have 1 class in your dataset? (personvotingspeaking
) or should that be 3 different classes
File "/home/nano2gb1/Documents/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 13, in intersect max_xy = np.minimum(box_a[:, 2:], box_b[2:]) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
"/home/nano2gb1/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
warning - image 20210212-103721 has object with unknown class 'fifty'
warning - image 20210212-103359 has object with unknown class 'thirty'
warning - image 20210212-103749 has object with unknown class 'fifty'
Traceback (most recent call last):
File "train_ssd.py", line 343, in
I have the same issues, and have tried to retake the data several times with different resolutions, reloaded the "requirements.txt". I am NOT using the container because I want to use this in an embedded Wifi denied region and would like it to re-train networks when commanded to do so locally.
https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-ssd.md
Followed the above instructions without the container. I uploaded the latest SD card files on Tuesday.
the cause of this error for min was an "<\object>" xml file.
Has this issue been corrected yet? Because I just downloaded this training a week ago.
lcs@ubuntu:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --dataset-type=voc --data=data/Cones --model-dir=models/Cones --batch-size=2 --workers=1 --epochs=1
2023-11-14 12:27:48 - Using CUDA...
2023-11-14 12:27:48 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder='models/Cones', dataset_type='voc', datasets=['data/Cones'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level='info', lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resolution=300, resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2023-11-14 12:27:56 - model resolution 300x300
2023-11-14 12:27:56 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2023-11-14 12:27:56 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2023-11-14 12:27:56 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2023-11-14 12:27:56 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2023-11-14 12:27:56 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2023-11-14 12:27:56 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2023-11-14 12:27:56 - Prepare training datasets.
warning - image 20231110-160802 has no box/labels annotations, ignoring from dataset
warning - image 20231110-160917 has no box/labels annotations, ignoring from dataset
warning - image 20231110-161855 has no box/labels annotations, ignoring from dataset
warning - image 20231110-162031 has no box/labels annotations, ignoring from dataset
warning - image 20231110-162039 has no box/labels annotations, ignoring from dataset
warning - image 20231110-163037 has no box/labels annotations, ignoring from dataset
warning - image 20231110-170330 has no box/labels annotations, ignoring from dataset
warning - image 20231113-154105 has no box/labels annotations, ignoring from dataset
warning - image 20231113-154110 has no box/labels annotations, ignoring from dataset
2023-11-14 12:27:56 - No labels file, using default VOC classes.
2023-11-14 12:27:56 - Stored labels into file models/Cones/labels.txt.
2023-11-14 12:27:56 - Train dataset size: 100
2023-11-14 12:27:56 - Prepare Validation datasets.
warning - image 20231110-160908 has no box/labels annotations, ignoring from dataset
warning - image 20231110-161855 has no box/labels annotations, ignoring from dataset
warning - image 20231110-162031 has no box/labels annotations, ignoring from dataset
warning - image 20231110-162039 has no box/labels annotations, ignoring from dataset
warning - image 20231110-163037 has no box/labels annotations, ignoring from dataset
warning - image 20231110-170330 has no box/labels annotations, ignoring from dataset
2023-11-14 12:27:56 - No labels file, using default VOC classes.
2023-11-14 12:27:56 - Validation dataset size: 100
2023-11-14 12:27:56 - Build network.
2023-11-14 12:27:56 - Init from pretrained SSD models/mobilenet-v1-ssd-mp-0_675.pth
models/mobilenet-v1-ssd-mp-0_675.pth 100%[==========================================================================================================>] 36.23M 11.1MB/s in 3.4s
2023-11-14 12:28:02 - Took 5.94 seconds to load the model.
2023-11-14 12:28:02 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2023-11-14 12:28:02 - Uses CosineAnnealingLR scheduler.
2023-11-14 12:28:02 - Start training from epoch 0.
warning - image 20231110-163623 has object with unknown class 'Cone'
warning - image 20231110-170417 has object with unknown class 'Cone'
Traceback (most recent call last):
File "train_ssd.py", line 406, in
I have followed the tutorial for training your own dateset for the SSD. I have been using this to recognise different plants. However once I run
python3 train_ssd.py --dataset-type=voc --data=(my dataset location) I get this what is below. I had edited the voc_dataset.py to allow captials in the names.