NVIDIA-AI-IOT / nvidia-tao

Other
77 stars 11 forks source link

Running TAO Toolkit on Google Colab

Google Colab provides access to free GPU instances for running compute jobs in the cloud. This page provides instructions for getting started with TAO Toolkit on Google Colab.

Google Colab has some restrictions with TAO based on the limitations of the hardware and software available with the Colab Instances. You can view the limitations in the Notes section.


Pre-Requisites

To use Google Collab, you'll need a Google Account with Google Drive access.

Log in to your Google account,or create one by following instructions on the Sign-Up for Gmail page


Launching Notebooks with Google Colab

TAO Toolkit provides an extensive model zoo containing pretrained models for both computer-vision and conversational AI use cases. You can click on the "One-Click Deploy" link for the model of your use-case.

Refer to the Steps to Locate files in a Colab Notebook section for an example of accessing files within the Colab notebook.


General-Purpose Computer Vision Models

With general purpose models, you can train an image classification model, object detection model, or an instance segmentation model.

Model Name One-Click Deploy Action
Multi-class Image Classification Multi-class Image classification Classification
Multi-task Image Classification Multi-task Image Classification Classification
DSSD Object Detection using Deformable DSSD Object Detection
EfficientDet Object Detection using EfficientDet Object Detection
RetinaNet Object Detection using RetinaNet Object Detection
SSD Object Detection using SSD Object Detection
Yolo V3 Object Detection using Yolo V3 Object Detection
Yolo V4 Object Detection using Yolo V4 Object Detection
Yolo V4 Tiny Object Detection using Yolo V4 Tiny Object Detection


Purpose-Built Computer Vision Models

Purpose-built models are built for high accuracy and performance. You can deploy these models out of the box for applications such as smart city, retail, public safety, and healthcare. You can also retrain them with your own data.

Model Name One-Click Deploy Purpose
ActionRecognitonNet Action Recognition Detecting actions from videos
LPRNET License Plate Recognition Recognize License plates numbers
HeartRateNet Heart Rate Estimation Estimates person's heartrate from RGB video
GestureNet Gesture Recognition Recognize hand gestures
EmotionNet Emotion Recognition Recognize facial Emotion
PoseClassificationNet Pose Classification Classify poses of people from their skeletons


Conversational AI Models

One-Click Deploy Base Architecture Dataset Purpose
Language Model N_Gram Librispeech LM Language modeling
Speech to Text English QuartzNet Quartznet ASR Set 1.2 Speech Transcription
Speech to Text English CitriNet CitriNet ASR Set 1.4 Speech Transcription
Speech to Text English Conformer Conformer ASR Set 1.4 Speech Transcription
Text to Speech Conformer ASR Set 1.4 Speech Transcription
Question Answering SQUAD2.0 Bert BERT SQuAD 2.0 Answering questions in SQuADv2.0, a reading comprehension dataset consisting of Wikipedia articles.
Named Entity Recognition Bert BERT GMB (Gronigen Meaning Book) Identifying entities in a given text (Supported Categories: Geographical Entity, Organization, Person , Geopolitical Entity, Time Indicator, Natural Phenomenon/Event)
Joint Intent and Slot Classification Bert BERT Proprietary Classifying an intent and detecting all relevant slots (Entities) for this Intent in a query. Intent and slot names are usually task specific. This model recognizes weather related intents like weather, temperature, rainfall etc. and entities like place, time, unit of temperature etc. For a comprehensive list, please check the corresponding model card.
Punctuation and Capitalization Bert BERT Tatoeba sentences, Books from the Project Gutenberg that were used as part of the LibriSpeech corpus, Transcripts from Fisher English Training Speech Add punctuation and capitalization to text.
Domain Classification English Bert BERT Proprietary For domain classification of queries into the 4 supported domains: weather, meteorology, personality, and none.


TAO Pre-trained Models (Inference Only)

In addition to training different models using the one-click deploy links, you can run inference with the Pre-trained Models TAO has published using this Notebook


Utility scripts to obtain subset of data

If you have limited storage space, or want to iterate quickly through training experiments, it is advised to carry out the following:

TAO Toolkit provides utility scripts to generate such subsets for COCO dataset (which is around ~25 GB with ~120k images) and KITTI dataset (which is around ~12 GB with ~14k images)


To obtain subset for KITTI:

Dataset folder structure for kitti:

  path_to_training_folder
  |___image_2
  |___label_2

  path_to_testing_folder
  |___image_2


To obtain subset for COCO:

Dataset folder structure for coco:

  folder_into_which_downloaded_coco_files_are_unzipped
  |___train2017
  |___val2017
  |___annotations
      |___instances_train2017.json
      |___instances_val2017.json


Steps to Locate Files in a Colab Notebook

  1. Mount the drive in the Colab Instance.

  2. Click on the folder icon (shown within the green box)

  1. Click on the 'move up one folder' icon (shown within the green box)


Notes