dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.86k stars 2.98k forks source link

Can I pass other dataset types to SSD model? #1177

Closed rajeshroy402 closed 1 year ago

rajeshroy402 commented 3 years ago

Hi I want to know if I can pass any other data set instead of Pascal VOC for custom object detection code with MobileNet SSD. The material discussed by @dusty-nv says that VOC is just used for convention as it is standard format .

Other thing - If I want to create my own data set (a custom data set from images/videos I own) then can I use Roboflow ? What should be the data set directory containing? Are there suppose to be sub-folders? If yes then which? Basically it would be great if you could help me understand the directory setup for my custom dataset !

I would appreciate other contributors to take a look and connect on this thread.

dusty-nv commented 3 years ago

I believe it supports Pascal VOC and Google Open Images - its just that Pascal VOC is a significantly easier format to work with. And I don't mean using the actual Pascal VOC, just organizing the data in the same way. CVAT tool (cvat.org) can do this and will export in Pascal VOC format


From: Rajesh Roy @.> Sent: Friday, August 6, 2021 5:30:47 PM To: dusty-nv/jetson-inference @.> Cc: Dustin Franklin @.>; Mention @.> Subject: [dusty-nv/jetson-inference] Can I pass other dataset types to SSD model? (#1177)

Hi I want to know if I can pass any other data set instead of Pascal VOC for custom object detection code with MobileNet SSD. The material discussed by @dusty-nvhttps://github.com/dusty-nv says that VOC is just used for convention as it is standard format .

Other thing - If I want to create my own data set (a custom data set from images/videos I own) then can I use Roboflowhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Froboflow.com%2F&data=04%7C01%7Cdustinf%40nvidia.com%7C52e1968999b448582ab208d9592174b4%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637638822503560708%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mn%2BRHX6BfpLjBV%2FmWwAtbQB%2B9%2BFEPSR7kQtRo97fcsY%3D&reserved=0 ? What should be the data set directory containing? Are there suppose to be sub-folders? If yes then which? Basically it would be great if you could help me understand the directory setup for my custom dataset !

I would appreciate other contributors to take a look and connect on this thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-inference/issues/1177, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGKYNDVNYKYXST6MMED3T3RIAPANCNFSM5BWUMUCA. Triage notifications on the go with GitHub Mobile for iOShttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cdustinf%40nvidia.com%7C52e1968999b448582ab208d9592174b4%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637638822503570700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wrTZERnxPvps9Dc2fpn24rbOyEPIRVEreg4g3pzLz2Q%3D&reserved=0 or Androidhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26utm_campaign%3Dnotification-email&data=04%7C01%7Cdustinf%40nvidia.com%7C52e1968999b448582ab208d9592174b4%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637638822503570700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VV%2Bgl4VdmR64qXFe%2BiYCtgnNbx5iJMW2NpubHgh8wgU%3D&reserved=0.

rajeshroy402 commented 3 years ago

Ok I will try CVAT. Meanwhile when I am using the downloader.py it is not downloading all the classes for dataset

root@jetson-desktop:/jetson-inference/python/training/detection/ssd# python3 open_images_downloader.py --stats-only --class-names "Aircraft, Handbag, Car" --data=data/vehicle
2021-08-08 18:39:45 - Download https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv.
2021-08-08 18:39:45 - Requested 3 classes, found 3 classes
2021-08-08 18:39:45 - Download https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv.
2021-08-08 18:48:18 - Read annotation file data/vehicle/train-annotations-bbox.csv
Killed
rajeshroy402 commented 3 years ago

Hi @dusty-nv - I tried CVAT to annotate few images but the code ran into precedence error. Please check this:

root@jetson-desktop:/jetson-inference/python/training/detection/ssd# python3 train_ssd.py --dataset-type=voc --data=data/package_counting/ --model-dir=models/package_counting --batch-size=2 --workers=1 --epochs=1
2021-08-08 19:48:52 - Using CUDA...
2021-08-08 19:48:52 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder='models/package_counting', dataset_type='voc', datasets=['data/package_counting/'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-08-08 19:48:52 - Prepare training datasets.
2021-08-08 19:48:52 - No labels file, using default VOC classes.
2021-08-08 19:48:52 - Stored labels into file models/package_counting/labels.txt.
2021-08-08 19:48:52 - Train dataset size: 9
2021-08-08 19:48:52 - Prepare Validation datasets.
2021-08-08 19:48:52 - No labels file, using default VOC classes.
2021-08-08 19:48:52 - Validation dataset size: 9
2021-08-08 19:48:52 - Build network.
2021-08-08 19:48:52 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2021-08-08 19:48:53 - Took 0.55 seconds to load the model.
2021-08-08 19:49:07 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2021-08-08 19:49:07 - Uses CosineAnnealingLR scheduler.
2021-08-08 19:49:07 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
warning - image side_1-36_jpeg.rf.e47a4903da95df2407c59da095a1b1a1 has object with unknown class 'Sack'
warning - image side_1-36_jpeg.rf.e47a4903da95df2407c59da095a1b1a1 has object with unknown class 'Sack'
warning - image side_1-13_jpeg.rf.e9f5964fbf6b3b93e3922c2f987b8eb9 has object with unknown class 'Carton'
Traceback (most recent call last):
  File "train_ssd.py", line 343, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 113, in train
    for i, data in enumerate(loader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataset.py", line 207, in __getitem__
    return self.datasets[dataset_idx][sample_idx]
  File "/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 81, in __getitem__
    image, boxes, labels = self.transform(image, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/ssd/data_preprocessing.py", line 34, in __call__
    return self.augment(img, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 55, in __call__
    img, boxes, labels = t(img, boxes, labels)
  File "/jetson-inference/python/training/detection/ssd/vision/transforms/transforms.py", line 345, in __call__
    boxes[:, :2] += (int(left), int(top))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Here is how my dataset looks like:

root@jetson-desktop:/jetson-inference/python/training/detection/ssd/data/package
_counting# ls
Annotations/  ImageSets/  JPEGImages/  Labels.txt  labelmap.txt
rajeshroy402 commented 3 years ago

I thought the issue is with Pytorch as I found that it is not already installed in the machine. I went through the docker/build.sh command and later python is recognising the torch:

root@jetson-desktop:/jetson-inference/python/training/detection/ssd# python3
Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.6.0
>>> 
>>> print(torch.cuda.is_available())
True
>>> import torchvision
>>> print(torchvision.__version__)
0.7.0a0+78ed10c
>>> 

But the errors are still fresh:

root@jetson-desktop:/jetson-inference/python/training/detection/ssd# python3 train_ssd.py --dataset-type=voc --data=data/package_counting --model-dir=models/package_counting --batch-size=2 --workers=1 --epochs=1
2021-08-08 21:11:55 - Using CUDA...
2021-08-08 21:11:55 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder='models/package_counting', dataset_type='voc', datasets=['data/package_counting'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-08-08 21:11:55 - Prepare training datasets.
2021-08-08 21:11:55 - VOC Labels read from file: ('BACKGROUND', 'Sack', 'Carton')
2021-08-08 21:11:55 - Stored labels into file models/package_counting/labels.txt.
2021-08-08 21:11:55 - Train dataset size: 9
2021-08-08 21:11:55 - Prepare Validation datasets.
2021-08-08 21:11:55 - VOC Labels read from file: ('BACKGROUND', 'Sack', 'Carton')
2021-08-08 21:11:55 - Validation dataset size: 9
2021-08-08 21:11:55 - Build network.
2021-08-08 21:11:55 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2021-08-08 21:11:56 - Took 0.54 seconds to load the model.
2021-08-08 21:12:09 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2021-08-08 21:12:09 - Uses CosineAnnealingLR scheduler.
2021-08-08 21:12:09 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:44: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "train_ssd.py", line 346, in <module>
    val_loss, val_regression_loss, val_classification_loss = test(val_loader, net, criterion, DEVICE)
  File "train_ssd.py", line 150, in test
    for _, data in enumerate(loader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 291, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 737, in __init__
    w.start()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
rajeshroy402 commented 3 years ago

Ok I will try CVAT. Meanwhile when I am using the downloader.py it is not downloading all the classes for dataset

root@jetson-desktop:/jetson-inference/python/training/detection/ssd# python3 open_images_downloader.py --stats-only --class-names "Aircraft, Handbag, Car" --data=data/vehicle
2021-08-08 18:39:45 - Download https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv.
2021-08-08 18:39:45 - Requested 3 classes, found 3 classes
2021-08-08 18:39:45 - Download https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv.
2021-08-08 18:48:18 - Read annotation file data/vehicle/train-annotations-bbox.csv
Killed

Figured out that there was no memory left for this. Fixed this with memory swapping. 👍

rajeshroy402 commented 3 years ago

I thought the issue is with Pytorch as I found that it is not already installed in the machine. I went through the docker/build.sh command and later python is recognising the torch:

root@jetson-desktop:/jetson-inference/python/training/detection/ssd# python3
Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.6.0
>>> 
>>> print(torch.cuda.is_available())
True
>>> import torchvision
>>> print(torchvision.__version__)
0.7.0a0+78ed10c
>>> 

But the errors are still fresh:

root@jetson-desktop:/jetson-inference/python/training/detection/ssd# python3 train_ssd.py --dataset-type=voc --data=data/package_counting --model-dir=models/package_counting --batch-size=2 --workers=1 --epochs=1
2021-08-08 21:11:55 - Using CUDA...
2021-08-08 21:11:55 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder='models/package_counting', dataset_type='voc', datasets=['data/package_counting'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-08-08 21:11:55 - Prepare training datasets.
2021-08-08 21:11:55 - VOC Labels read from file: ('BACKGROUND', 'Sack', 'Carton')
2021-08-08 21:11:55 - Stored labels into file models/package_counting/labels.txt.
2021-08-08 21:11:55 - Train dataset size: 9
2021-08-08 21:11:55 - Prepare Validation datasets.
2021-08-08 21:11:55 - VOC Labels read from file: ('BACKGROUND', 'Sack', 'Carton')
2021-08-08 21:11:55 - Validation dataset size: 9
2021-08-08 21:11:55 - Build network.
2021-08-08 21:11:55 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2021-08-08 21:11:56 - Took 0.54 seconds to load the model.
2021-08-08 21:12:09 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2021-08-08 21:12:09 - Uses CosineAnnealingLR scheduler.
2021-08-08 21:12:09 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:44: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "train_ssd.py", line 346, in <module>
    val_loss, val_regression_loss, val_classification_loss = test(val_loader, net, criterion, DEVICE)
  File "train_ssd.py", line 150, in test
    for _, data in enumerate(loader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 291, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 737, in __init__
    w.start()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

@dusty-nv ?

dusty-nv commented 3 years ago

OSError: [Errno 12] Cannot allocate memory

It's running out of memory - try these suggestions: