Closed arjuntx2 closed 1 year ago
When you type that command into the console, are you running it with the \
forward slashes at the end of the lines?
If so, remove those and just run it on one line.
Also, does it work when you run it as detectnet.py
, but not detectnet-camera.py
?
Can you try running it with the --debug
flag added to get more info?
Yes, I am running with \ forward slashes. I will remove it and give it a try :)
Although there is no file named detectnet in Pytorch-ssd directory when I cloned it. Do I need to copy file detectnet-camera.py from build/aarch64/bin ? Because I am seeing bash error : No command found named detectnet.py when I run the command in Pytorch-Ssd directory.
Thanks
You need to do sudo make install
. Maybe it is running the old detectnet-camera that hadn't been updated?
I will get back to you soon. Thanks alot
As You have mentioned in the document :
To classify some static test images, we'll use the extended command-line parameters to detectnet (or detectnet.py) to load our custom SSD-Mobilenet ONNX model. To run these commands, the working directory of your terminal should still be located in: jetson-inference/python/training/detection/ssd/
If I go to this directory, In my case, there is no such a file named detectnet or detectnet.py in that directory
So I get this error :
bash: detectnet: command not found
Soin that directory, is it still supposed to run the following command line? Am I making any mistake here?
detectnet --model=models/fruit/ssd-mobilenet.onnx --labels=models/fruit/labels.txt \ --input-blob=input0 --output-cvg=scores --output-bbox=boxes \ "images/fruit*.jpg" test_fruit
In my case I copied files detectnet-console and detectnet-camera from build/aarch64/bin/ and pasted it in jetson-inference/python/training/detection/ssd/
After that, I removed back slash as you suggested and I was able to create engine running detecnet-console
I trained it for Boy/Girl data and it is not detecting anything.
*Error:detectnet-console: writing 1067x1600 image to 'test_fruit' [image] invalid extension format '.test_fruit' saving image 'test_fruit' [image] valid extensions are: JPG/JPEG, PNG, TGA, BMP, and HDR. detectnet-console: failed saving 1067x1600 image to 'test_fruit'
Although, the example image has .jpg format.
Thank you for answering again :)
If I go to this directory, In my case, there is no such a file named detectnet or detectnet.py in that directory
When you do sudo make install
, these get installed to /usr/local/bin
, which means they should run from any directory.
You should run sudo make install
from your jetson-inference/build
directory.
In my case I copied files detectnet-console and detectnet-camera from build/aarch64/bin/ and pasted it in jetson-inference/python/training/detection/ssd/
You could also do the reverse, run the programs from jetson-inference/build/aarch64/bin
, and then adjust the paths to your custom model. It doesn't actually need to be run from jetson-inference/python/training/detection/ssd/
, it just makes the paths to your custom model shorter.
Hi @dusty-nv , is it possible to retrain the model for only two new classes on some other computer and use it on Jetson? I have tried using DIGITS locally but I kept running into one problem after another without running a single epoch.
Hi Torab, DIGITS is only supported on x86 (not Jetson), and the DIGITS portion of the Hello AI World tutorial is deprecated.
You can run the same PyTorch training code on PC running Ubuntu though (as long as PC has NVIDIA GPU). Just install PyTorch and torchvision on your PC first.
From: Torab Shaikh notifications@github.com Sent: Sunday, July 26, 2020 12:04:00 PM To: dusty-nv/jetson-inference jetson-inference@noreply.github.com Cc: Dustin Franklin dustinf@nvidia.com; Mention mention@noreply.github.com Subject: Re: [dusty-nv/jetson-inference] Retraining SSD-Mobilenet (#649)
Hi @dusty-nvhttps://github.com/dusty-nv , is it possible to retrain the model for only two new classes on some other computer and use it on Jetson? I have tried using DIGITS locally but I kept running into one problem after another without running a single epoch.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-inference/issues/649#issuecomment-664006397, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGK7L7VHRCQAGR4G3OJLR5RHXBANCNFSM4PDSNLYQ.
Hello, @dusty-nv I actually tried this, i retrained on an EC2 instance, afterwards i took the pth file, converted it to onnx on a jetson nano and it didnt run, i got "killed" after "Graph construction and optimization completed in 0.103095 seconds." in the first run, couldn't create the engine. I tried the same exactly on my Xavier NX and that worked fine, the model isn't that large, 10 epochs on 2 classes of open images, batch size 128 though.
could that be the reason?
batch size 128 though.
Yes, increased batch size will also increase the memory, so you may want to try small batch size for Nano. Or mount swap on Nano for when it is building the TensorRT engine.
@dusty-nv thank you for your reply, i tried again just 1 epoch with batch size 32 on the EC2 instance, then transferred to the nano and mounted swap 4G...same issue, i will try with batch size 12 and update you.
Actually, the training batch size is independent of the ONNX batch size (that should always be 1). So must be something else...
then what can it be? it was a fresh install for the nano, works perfectly fine with the existing models for detectnet, and my custom model works fine on the xavier
Does it work with this model? https://nvidia.box.com/shared/static/gq0zlf0g2r258g3ldabl9o7vch18cxmi.gz
Also, when you did a fresh install on the Nano, it was of JetPack 4.4, right?
nope, just tested it, same thing, getting killed. just works with the pre-trained ones included in this repo, i tried 3 custom trained so far. and yes fresh install of 4.4
Can you run sudo tegrastats
in another terminal to keep an eye on the memory usage? 'Killed' normally means out of memory... interesting, because it does load on my Nano here.
seems to be out of memory yes, but why? the model is small and the pre-trained ones work
I'm not sure why it happens, since it doesn't run out of memory on my Nano. Can you try disabling the Ubuntu UI and rebooting?
To disable GUI - https://askubuntu.com/a/1056371
Then with GUI disabled, try running on a single test image. If it works, you can then re-enable the GUI because the TensorRT engine will already be built for next time.
That actually worked! but for about 5 mins, then it is frozen in the tactic, i will leave it for a while, tegrastats shows 3300+ utilization, i will update you if it changes, if not, i will restart it
I left it for hours, it is frozen in the middle of preparing the engine, anything else i can do to free more memory?
Hmm not sure why it isn't working for you.
Can you try running sudo systemctl disable nvzramconfig
? Then reboot (and remount your 4GB swap if you need to)
@drmnasr I think I found a way to work around this - comment out these lines:
//if( modelTypeFromPath(model) == MODEL_ONNX )
// mWorkspaceSize = 2048 << 20;
Then re-run make
and sudo make install
@drmnasr We were facing a similar problem with one of our Jetson Nano. I was using detectnet.py
to load the model and it was taking hours to build the engine file and that too after failing multiple times. I switched to detectnet command from Python when loading the model first time. It built the engine file then switched to python to use the model.
@dusty-nv I can never thank you enough. worked perfectly!! thank you!
Hi @dusty-nv Has the original question in the post finally been resolved? I am having similar problems....INVALID_ARGUMENT: Cannot find binding of given name: coverage.
Any idea about what could be happening?
I am having similar problems....INVALID_ARGUMENT: Cannot find binding of given name: coverage.
This seems like a different issue. It isn't getting the custom layers names. Are you sure you have the following command line arguments included? (check for typos)
--input-blob=input_0 --output-cvg=scores --output-bbox=boxes
Oh sorry, my mistake!! You were right. It was a typo problem. Thanks!!!
Hi,I just want to make something clear:
In this argument, is detectnet.py same as detectnet-camera.py when I want to run from the live camera. If so, I am getting following error.
Thank you for the hepl :)