Closed saifurrehman4114 closed 3 years ago
@dusty-nv should I add the library of c++ text to speech in the docker or can I use some kind of python library of Pyttsx3 add it into the correct file of the detect net as I am unable to find the required file.
Kindly can you tell me the solution as I have to submit my final year project before 15 June. That's why
I have tried to edit the jetson-inference-python-examples detectnet.py by adding pysttx3, but the model is not found:
code is as follows:
#
#
#
#
#
import jetson.inference import jetson.utils
import argparse import sys
import pyttsx3
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.", formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.detectNet.Usage() + jetson.utils.videoSource.Usage() + jetson.utils.videoOutput.Usage() + jetson.utils.logUsage())
parser.add_argument("input_URI", type=str, default="", nargs='?', help="URI of the input stream") parser.add_argument("output_URI", type=str, default="", nargs='?', help="URI of the output stream") parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load (see below for options)") parser.add_argument("--overlay", type=str, default="box,labels,conf", help="detection overlay flags (e.g. --overlay=box,labels,conf)\nvalid combinations are: 'box', 'labels', 'conf', 'none'") parser.add_argument("--threshold", type=float, default=0.5, help="minimum detection threshold to use")
is_headless = ["--headless"] if sys.argv[0].find('console.py') != -1 else [""]
try: opt = parser.parse_known_args()[0] except: print("") parser.print_help() sys.exit(0)
net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold)
input = jetson.utils.videoSource(opt.input_URI, argv=sys.argv) output = jetson.utils.videoOutput(opt.output_URI, argv=sys.argv+is_headless)
def text_speech(detections):
engine = pyttsx3.init()
engine.say(f'{detections}')
engine.runAndWait()
while True:
img = input.Capture()
# detect objects in the image (with overlay)
detections = net.Detect(img, overlay=opt.overlay)
#txt to speech call
text_speech(detections)
# print the detections
print("detected {:d} objects in image".format(len(detections)))
for detection in detections:
print(detection)
# render the image
output.Render(img)
# update the title bar
output.SetStatus("{:s} | Network {:.0f} FPS".format(opt.network, net.GetNetworkFPS()))
# print out performance info
net.PrintProfilerTimes()
# exit on input/output EOS
if not input.IsStreaming() or not output.IsStreaming():
break
Hi @saifurrehman4114, since pyttsx3 is a python library, I would customize detectnet.py.
You can see my follow-up to your forum post here: https://forums.developer.nvidia.com/t/how-to-add-voice-capability-during-inference-in-ssd-model/179999/5?u=dusty_nv
Hello @dusty-nv ,
Hope you are fine.
I have trained the SSD model using my own dataset for object detection.
I want to ask how to add voice capability like when it detects the object speaker should say the label of that object during inference.
Currently the command for inference in bash shell I am using for the Inference is their any argument to add also: