Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

How to get latency on custom YOLO-Nas model? #1351

Closed Chiauwen closed 1 year ago

Chiauwen commented 1 year ago

💡 Your Question

Previously I used the code below to get the latency on my custom trained model with YOLOv5: python val.py --weights best.pt --data data.yaml --task test

But of course it's not working on YOLO-NAS, so how to generate the latency value on the custom train model with YOLO-NAS?

Thank you.

Versions

No response

BloodAxe commented 1 year ago

You need to export Yolo-NAS model to ONNX which you can later measure using trtexec.

Here are some docs you may find useful:

madmon24 commented 9 months ago

from super_gradients.training import Trainer from super_gradients.common.object_names import Models from super_gradients.training import models from super_gradients.training.processing import ComposeProcessing from torch.utils.data import DataLoader

trainer = Trainer(experiment_name="yolo_nas_s_soccer_players", ckpt_root_dir="/content/sg_checkpoints_dir/") net = models.get(Models.YOLO_NAS_S, num_classes=4, pretrained_weights="coco")

trainer.train(model=net, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader)

2024-01-11 17:44:19] INFO - checkpoint_utils.py - License Notification: YOLO-NAS pre-trained weights are subjected to the specific license terms and conditions detailed in https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md By downloading the pre-trained weight files you agree to comply with these terms. [2024-01-11 17:44:19] INFO - checkpoint_utils.py - Successfully loaded pretrained weights for architecture yolo_nas_s [2024-01-11 17:44:19] WARNING - sg_trainer.py - Train dataset size % batch_size != 0 and drop_last=False, this might result in smaller last batch. [2024-01-11 17:44:19] INFO - sg_trainer.py - Starting a new run with run_id=RUN_20240111_174419_954854 [2024-01-11 17:44:19] INFO - sg_trainer.py - Checkpoints directory: /content/sg_checkpoints_dir/yolo_nas_s_soccer_players/RUN_20240111_174419_954854 [2024-01-11 17:44:19] INFO - sg_trainer.py - Using EMA with params {'decay': 0.9, 'decay_type': 'threshold'} The console stream is now moved to /content/sg_checkpoints_dir/yolo_nas_s_soccer_players/RUN_20240111_174419_954854/console_Jan11_17_44_19.txt /usr/local/lib/python3.10/dist-packages/super_gradients/common/registry/registry.py:72: DeprecationWarning: Object name linear_epoch_step is now deprecated. Please replace it with LinearEpochLRWarmup. warnings.warn(f"Object name {name} is now deprecated. Please replace it with {deprecated_names[name]}.", DeprecationWarning)

RuntimeError Traceback (most recent call last) in <cell line: 39>() 37 net = models.get(Models.YOLO_NAS_S, num_classes=4, pretrained_weights="coco") 38 ---> 39 trainer.train(model=net, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader) 40 41 self.train_loader.reset()

......

/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py in default_collate(batch) 136 tensor([0, 1, 2, 3]) 137 >>> # Example with a batch of strs: --> 138 >>> default_collate(['a', 'b', 'c']) 139 ['a', 'b', 'c'] 140 >>> # Example with Map inside the batch:

RuntimeError: stack expects each tensor to be equal size, but got [18, 5] at entry 0 and [19, 5] at entry 1

I am getting this error whwn I am trying to train the yolonas model coco format with roboflow's soccer-players-2 dataset. I am stuck at this error. Please get back soon if there is a solution for this.

@BloodAxe

BloodAxe commented 9 months ago

How you create your dataloaders?

madmon24 commented 9 months ago

Through the roboflow website.

On Thu, Jan 11, 2024 at 12:28 PM Eugene Khvedchenya < @.***> wrote:

How you create your dataloaders?

— Reply to this email directly, view it on GitHub https://github.com/Deci-AI/super-gradients/issues/1351#issuecomment-1887913142, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFI4UILIRPYSWBFD7NNKZATYOBDP7AVCNFSM6AAAAAA3GYPX5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXHEYTGMJUGI . You are receiving this because you commented.Message ID: @.***>

BloodAxe commented 9 months ago

DataLoader not dataset. What collate function are you using? Please follow tutorials and use the right collate and loss function as show in YoloNAS dataset params: https://github.com/Deci-AI/super-gradients/blob/5fe4f4ca94d5e556608b7160558cdbf18cbcc2c5/src/super_gradients/recipes/dataset_params/coco_detection_yolo_nas_dataset_params.yaml

See train_dataloader_params/val_dataloader_params collate function

madmon24 commented 9 months ago

I used the right collate function mentioned in the yolonas dataset params.

I am getting this new error now.

TypeError Traceback (most recent call last) Cell In[21], line 8 6 trainer = Trainer(experiment_name="yolo_nas_s_100", ckpt_root_dir="sg_checkpoints_dir/") 7 net = models.get(Models.YOLO_NAS_S, num_classes=1, pretrained_weights="coco") ----> 8 trainer.train(model=net, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader)

File ~/anaconda3/envs/yolonas0/lib/python3.10/site-packages/super_gradients/training/sg_trainer/sg_trainer.py:1472, in Trainer.train(self, model, training_params, train_loader, valid_loader, test_loaders, additional_configs_to_log) 1465 raise ValueError( 1466 "You can use sliding window validation callback, but your model does not support sliding window " 1467 "inference. Please either remove the callback or use the model that supports sliding inference: " 1468 "Segformer" 1469 ) 1471 if isinstance(model, SupportsInputShapeCheck): -> 1472 first_train_batch = next(iter(self.trainloader)) 1473 inputs, , _ = sg_trainer_utils.unpack_batch_items(first_train_batch) 1474 model.validate_input_shape(inputs.size())

File ~/anaconda3/envs/yolonas0/lib/python3.10/site-packages/torch/utils/data/dataloader.py:530, in _BaseDataLoaderIter.next(self) 528 if self._sampler_iter is None: 529 self._reset() --> 530 data = self._next_data() 531 self._num_yielded += 1 532 if self._dataset_kind == _DatasetKind.Iterable and \ 533 self._IterableDataset_len_called is not None and \ 534 self._num_yielded > self._IterableDataset_len_called:

File ~/anaconda3/envs/yolonas0/lib/python3.10/site-packages/torch/utils/data/dataloader.py:570, in _SingleProcessDataLoaderIter._next_data(self) 568 def _next_data(self): 569 index = self._next_index() # may raise StopIteration --> 570 data = self._dataset_fetcher.fetch(index) # may raise StopIteration 571 if self._pin_memory: 572 data = _utils.pin_memory.pin_memory(data)

File ~/anaconda3/envs/yolonas0/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:52, in _MapDatasetFetcher.fetch(self, possibly_batched_index) 50 else: 51 data = self.dataset[possibly_batched_index] ---> 52 return self.collate_fn(data)

Cell In[17], line 12, in DetectionCollateFN.call(self, data) 11 def call(self, data) -> Tuple[torch.Tensor, torch.Tensor]: ---> 12 data = [sample for sample in data if sample[0].size(0) > 0] 14 if not data: 15 # If there are no valid samples, return empty tensors 16 return torch.zeros(0), torch.zeros((0, 5))

Cell In[17], line 12, in (.0) 11 def call(self, data) -> Tuple[torch.Tensor, torch.Tensor]: ---> 12 data = [sample for sample in data if sample[0].size(0) > 0] 14 if not data: 15 # If there are no valid samples, return empty tensors 16 return torch.zeros(0), torch.zeros((0, 5))

TypeError: 'int' object is not callable

BloodAxe commented 9 months ago

Please provide a minimal example that can reproduce your issue in the form of google colab notebook. it is impossible to help you just by looking at sparse code snippets that you have shown

madmon24 commented 9 months ago

PLease refer this script.Thanks.

https://github.com/madmon24/YoloNAS/blob/main/DeciYoloCustomDatasetQAFineTuning_madmon.ipynb