Closed gitgkk closed 3 years ago
It appears that you need to make a new labels.txt
in your dataset. By the looks of things it should have one line: Helipad
Hello @dusty-nv et All,
After failing to configure and build jetson-inference on my jetson nano, I picked up the docker version and installed it on system. I finally have a successful running detection model trained using PyTorch on my custom image dataset.
To all those readers who will land here searching a solution one day. Here's the list of steps that you need to perform for training on custom data in Pascal VOC format. I also took help from https://github.com/dusty-nv/jetson-inference/issues/789 and other issues.
More about this structure here https://programmer.help/blogs/tenor-flow-2.0-note-10-pascal-voc-data-set-introduction.html
Use labelImg tool only to annotate your images. The reason for this is that there are fields in .xml annotation file, whose values must reflect your structure and specific values, else training will fail. See link in point 1 above.I had to change the xml notation so that you can see the values of these tags, else the editor didn't let me post it in right format.
">folder
Now your data is ready for training. python3 train_ssd.py --dataset-type=voc --model-dir=models/data-name --data=data/data-name --pretrained-ssd=models/mobilenet-v1-ssd-mp-0_675.pth --batch-size=2 --num-epochs=400 --num-workers=2
You can configure, batch-size, epochs, num-workers as per your hardware. On nano, I used the above with comfort.
Pay attention to epochs because for ssd mobilenet it generates one .pth file of 25MBs. For 400 epochs, the program generated 10GB of .pth files. If you have more free space you can increase the number of epochs and this will improve the trained model accuracy. After running detection program for the first time, an optimized network will be generated, then you can delete the .pth files, if you don't have disk space.
Export model: python3 onnx_export.py --model-dir=models/data-name
Test via usb cam detectnet --model=models/helipad/ssd-mobilenet.onnx --labels=models/data-name/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes /dev/video0
I hope this helps.
Thanks, Kashyap
Hi @dusty-nv
As per your suggestion, I downloaded the docker and now using it. I tried running the train_ssd.py. But the command fails.
One of the difference that I noted was the presence of files in the Main folder. I just have default.txt with list of image names.
I have my own custom dataset of various helipad images. I used the cvat tool for annotation.
What should be the content of Main folder files? How should I resolve this error? I had multiple annotations in an image, I thought that would be a problem, then I reduced the dataset to 10 images with single annotation. Are there any restrictions as to how many annotation boxes are present in one image?
root@leapfrog:/jetson-inference/python/training/detection/ssd# python3 train_ssd.py --dataset-type=voc --data=data/helipad --model-dir=models/helipad --batch-size=1 --workers=1 --epochs=1 2021-06-11 13:06:47 - Using CUDA... 2021-06-11 13:06:47 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=1, checkpoint_folder='models/helipad', dataset_type='voc', datasets=['data/helipad'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005) 2021-06-11 13:06:47 - Prepare training datasets. 2021-06-11 13:06:47 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'Helipad:128,0,0::') 2021-06-11 13:06:47 - Stored labels into file models/helipad/labels.txt. 2021-06-11 13:06:47 - Train dataset size: 10 2021-06-11 13:06:47 - Prepare Validation datasets. 2021-06-11 13:06:47 - VOC Labels read from file: ('BACKGROUND', '# label:color_rgb:parts:actions', 'Helipad:128,0,0::') 2021-06-11 13:06:47 - Validation dataset size: 10 2021-06-11 13:06:47 - Build network. 2021-06-11 13:06:47 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth 2021-06-11 13:06:48 - Took 0.51 seconds to load the model. 2021-06-11 13:07:39 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01. 2021-06-11 13:07:39 - Uses CosineAnnealingLR scheduler. 2021-06-11 13:07:39 - Start training from epoch 0. /usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 113, in train
for i, data in enumerate(loader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataset.py", line 207, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 81, in getitem
image, boxes, labels = self.transform(image, boxes, labels)
File "/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in call
return self.augment(img, boxes, labels)
File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in call
img, boxes, labels = t(img, boxes, labels)
File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 345, in call
boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) warning - image 10 has object with unknown class 'Helipad' Traceback (most recent call last): File "train_ssd.py", line 343, inThanks, Kashyap