Closed e-mily closed 1 year ago
Hmm... hi @e-mily, can you share the output of ls /jetson-inference/python/training/detection/ssd/data/total-5/ImageSets/Main
with me?
sorry @dusty-nv I was able to train because I misplaced my dataset in the wrong folder
I have other questions to ask:
- how do i do image augmentation using the tutorial?
Image augmentation is already done automatically by the TrainAugmentation
transforms: https://github.com/dusty-nv/pytorch-ssd/blob/3f9ba554e33260c8c493a927d7c4fdaa3f388e72/vision/ssd/data_preprocessing.py#L4
So if you want, you can add to them there.
3. how can i change the number of layers being trained by detectnet?
You would need to change the SSD network definitions under https://github.com/dusty-nv/pytorch-ssd/tree/3f9ba554e33260c8c493a927d7c4fdaa3f388e72/vision/ssd (I have not attempted this)
4. if the imagesets/main/default.txt, how does the code divide the dataset into train, test, validation? is there a certain percentage to it? (I was only able to train with imagesets/main/default.txt)
default.txt
uses the same dataset across train and test, so it doesn't split it.
If you want it split, you should have different trainval.txt
and test.txt
files under ImageSets/Main
Thank you @dusty-nv. That was really helpful.
But then when I tried to put them into trainval.txt
, test.txt
, val.txt
etc I received the error as stated above.
When I tried to run livestream upon building the model. I realized my camera feed is flipped. Is there any way to flipped it back? I'm using Jetson TX2
But then when I tried to put them into
trainval.txt
,test.txt
,val.txt
etc I received the error as stated above.
So do you have the file: total-5/ImageSets/Main/trainval.txt
and total-5/ImageSets/Main/test.txt
? Does your user have permissions to read them?
They are looked for in the code here: https://github.com/dusty-nv/pytorch-ssd/blob/3f9ba554e33260c8c493a927d7c4fdaa3f388e72/vision/datasets/voc_dataset.py#L22
When I tried to run livestream upon building the model. I realized my camera feed is flipped. Is there any way to flipped it back? I'm using Jetson TX2
Yes, try running it with --input-flip=rotate-180
For more info, see here: https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-streaming.md#input-options
So do you have the file:
total-5/ImageSets/Main/trainval.txt
andtotal-5/ImageSets/Main/test.txt
? Does your user have permissions to read them?
I did. But it give TypeError: unsupported format string passed to PosixPath.format
But if i change it to total-5/ImageSets/Main/default.txt
then it works!
Erm how do i know if user has permission to read them?
@dusty-nv I realized the models are write-protected. how do i remove that so that i can delete it? because i want to change the parameters and train the model again.
Btw I was able to train with trainval.txt and val.txt! Thank you!
I have this error when i try to train the same model with increased epoch value
if i decrease the workers=0 i still get the same error. I also tried to swap the memory (i don't know if i did it correctly i dont really understand what im looking at) I have an sd card attached to the jetson tx2. will it help?
I realized the models are write-protected. how do i remove that so that i can delete it?
You can use command like sudo chown -R <your-user> <path-to-model-dir>
if i decrease the workers=0 i still get the same error. I also tried to swap the memory (i don't know if i did it correctly i dont really understand what im looking at)
The killed
message you are get normally means the board has run out of memory. I recommend running with --batch-size=1
and --workers=0
to decrease the memory usage. Also here are the instructions for mounting swap, disabling ZRAM, and disabling the desktop GUI:
Thank you @dusty-nv ! I was training my model with increasing epoch and i found out that the more epoch i have. when i test my model with test images. I dont see any bounding boxes as all. i dont see any confidence level displayed in the terminal as well. What do i do?
like this one. Im suppose to have 3 attirbutes but it can only detect 1. I don't know why the bounding box is so small.
detectnet --model=models/5-imagesa/ssd-mobilenet.onnx --labels=models/5-images/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes "/jetson-inference/data/imagess/traffic_*.jpeg" /jetson-inference/data/imagess/test2/traffic_%i.jpeg
This is the code i ran.
Its either that or I'm not getting any results at all with increasing epoch. ![Uploading traffic2.jpeg…]()
Can you try deleting the *.engine
file from your model's folder and try running detectnet program again?
How many epochs did you train it for? Normally at least 30 is needed for good results. You can run the pytorch-ssd code on a Linux/Ubuntu PC for faster training (you will need to install PyTorch on it and such)
Also, you can use the run_ssd_example.py
script to test one of your PyTorch .pth model checkpoints before it gets exported to ONNX. This will help you to confirm if the model is in fact trained to your liking first.
Can you give me the full command to run run_ssd_example.py
?
I tried from 5 epoch and increasing to 50. It only shows accuracy for 5 epoch and 10 epoch. Afterwards it just seems like it couldnt detect anything as it wasn't showing any accuracy figure.
Can you give me the full command to run
run_ssd_example.py
?
python3 run_ssd_example.py mb1-ssd <path-to-pth-checkpoint> <path-to-labels.txt> <path-to-test-image>
python3 run_ssd_example.py mb1-ssd
root@aititx22-desktop:/jetson-inference/python/training/detection/ssd# python3 run_ssd_example.py mb1-ssd models/20-imagesa/mb1-ssd-Epoch-9-Loss-7.462369181893089.pth models/20-imagesa/labels.txt /jetson-inference/data/imagess/test/traffic_%i.jpeg
Traceback (most recent call last): File "run_ssd_example.py", line 50, in <module> image = cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.5.0) /opt/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
I tried like that but i got this error...
So i guess the correct command is
root@aititx22-desktop:/jetson-inference/python/training/detection/ssd# python3 run_ssd_example.py mb1-ssd models/20-imagesa/mb1-ssd-Epoch-9-Loss-7.462369181893089.pth models/20-imagesa/labels.txt /jetson-inference/data/imagess/traffic_8.jpeg
Inference time: 2.8669397830963135 Found 0 objects. The output image is run_ssd_example_output.jpg
what do i do? i followed through every steps...
I'll try with increasing epochs. Just curious, shouldn't it be able to detect anything even with very low accuracy?
root@aititx22-desktop:/jetson-inference/python/training/detection/ssd# python3 run_ssd_example.py mb1-ssd models/20-imagesa/mb1-ssd-Epoch-99-Loss-4.31419215780316.pth models/20-imagesa/labels.txt /jetson-inference/data/imagess/traffic_8.jpeg
Inference time: 4.292574882507324 Found 0 objects. The output image is run_ssd_example_output.jpg
still zero objects found after running for 100 epochs...
what did i do wrong?
How many images are in your dataset? Are the objects easily discernible? Are they small? It seems like the objects you are training it on may be difficult for it to recognize.
How many images are in your dataset? Are the objects easily discernible? Are they small? It seems like the objects you are training it on may be difficult for it to recognize.
Im training 20 images for 3 annotations. The objects are not small. Im training it from different distance. Im aware you need at least 100 images per annotations to train but i dont have that much dataset per annotations.
Is there a way to increase the dataset through image augmentation??
I wanna analyze the accuracy with increasing images per annotations and increasing epochs... But i cant get any accuracy out...
Im training 20 images for 3 annotations. The objects are not small. Im training it from different distance. Im aware you need at least 100 images per annotations to train but i dont have that much dataset per annotations.
OK yes, you are going to need more images in your dataset. What are your 3 object classes? If they are all road signs, that you want to tell apart just by their different text, that may be more challenging for the DNN and you may need even more images in your dataset.
Is there a way to increase the dataset through image augmentation??
The train_ssd.py script already is doing image augmentation
i see. I'll try again with increasing image.
Instead of camera stream or test images, can i use video to test the accuracy of my model with detectnet?
If so, what is the command for that?
Hi @e-mily, detectnet/detectnet.py doesn't have built-in accuracy, because it has no knowledge of the ground-truth data. It is meant for inferencing only. It's on the PyTorch side that has knowledge of the dataset and groundtruth.
thank you @dusty-nv. I have another issue. I created a new sets of dataset to increase the number of images and labels. When i try to run train_ssd.py it gives TypeError: unsupported format string passed to PosixPath.__format__
error.
I re-attempt with the old datasets and it works! But i want to use to new datasets.
When i compare between the old and new datasets they look the same to me. So, I don't really know whats the real issue is. What do you think?
https://drive.google.com/drive/folders/1--DIZr1JPnETLCfGm6gnYrfAuQXxAdRn?usp=sharing
This is the link to my dataset. it would be a great help if you can check it out.
i tried using the command --debug-steps=1
and I also command out the part from voc_dataset.py
but Im not sure how to commit the change in the container.
And also i still can't seem to divide them into trainval.txt
and test.txt
When i try to run train_ssd.py it gives
TypeError: unsupported format string passed to PosixPath.__format__
error.
Can you provide the full error/exception output from the console, so I can see where in the code it is happening at?
i tried using the command
--debug-steps=1
and I also command out the part fromvoc_dataset.py
but Im not sure how to commit the change in the container.
You would want to edit this inside the container using the nano
editor, or just run it without container by installing from source. Or I guess you could mount the jetson-inference/pytorch-ssd source code into the container, that would work too.
thank you @dusty-nv turns out it was from my dataset. I want to ask how do i train for different models?
The ssd-mobilenet-v1 is the only network architecture from pytorch-ssd that I have tested & verified is working through the whole pipeline, including the ONNX export from PyTorch and import into TensorRT and runtime pre/post-processing with jetson-inference
to @dusty-nv I am at the same spot that opened this thread; I have the line 214 error and I checked my directory and I do have read and write permission with the 4 files in the directory. There were so many other issues listed that I am not sure what solved the problem. Can you tell me what I should try next.
to @dusty-nv - redid the entire process with a simpler set of objects; just 3 styles of batteries with 3 of each in many positions. When I run the train_ssd.py I still get stuck at line 214. I am sure I am missing something simple. Thanks, Stephen
@chromaowl can you provide the terminal log of the error you are getting?
Are you sure you're providing the correct path to your dataset when you launch train_ssd.py?
root@VCEDbreadboard:/jetson-inference/python/training/detection/ssd# python3 train_ssd.py --dataset-type=voc --data=data/batteries --model-dir=models/batteries --batch-size=4 --epochs=2 --workers=1
2022-07-21 15:54:03 - Using CUDA...
2022-07-21 15:54:03 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/batteries', dataset_type='voc', datasets=['data/batteries'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=2, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2022-07-21 15:54:03 - Prepare training datasets.
Traceback (most recent call last):
File "train_ssd.py", line 214, in
This is the path to my data: root@VCEDbreadboard:/jetson-inference/python/training/detection/ssd# cd data/batteries root@VCEDbreadboard:/jetson-inference/python/training/detection/ssd/data/batteri es# ls -l total 16 drwxr-xr-x 2 root root 4096 Jul 20 21:02 Annotations drwxr-xr-x 3 root root 4096 Jul 20 20:09 ImageSets drwxr-xr-x 2 root root 4096 Jul 20 21:02 JPEGImages -rw-rw-r-- 1 1000 1000 17 Jul 20 21:22 labels.txt root@VCEDbreadboard:/jetson-inference/python/training/detection/ssd/data/batteri es# ^C root@VCEDbreadboard:/jetson-inference/python/training/detection/ssd/data/batteries#
@dusty-nv @chromaowl I am also getting ascii error. Could you please tell me how you fixed the issue:
2023-10-02 14:59:34 - Prepare training datasets.
warning - image 20231002-115317 has no box/labels annotations, ignoring from dataset
Traceback (most recent call last):
File "train_ssd.py", line 263, in
@dusty-nv I followed the tutorial and created train.txt , test.txt , val.txt and trainval.txt in the ImageSets/Main. I even switched to just having default.txt in the ImageSets/Main and I'm still getting the following error. Can you help me?
root@aititx22-desktop:/jetson-inference/python/training/detection/ssd# python3 train_ssd.py --dataset-type=voc --data=data/total-5 --model=models/total-5 --batch-size=2 --workers=1 --epochs=1 2022-02-23 09:22:37 - Using CUDA... 2022-02-23 09:22:37 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder='models/total-5', dataset_type='voc', datasets=['data/total-5'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005) 2022-02-23 09:22:37 - Prepare training datasets. Traceback (most recent call last): File "train_ssd.py", line 214, in
target_transform=target_transform)
File "/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 33, in init
raise IOError("missing ImageSet file {:s}".format(image_sets_file))
TypeError: unsupported format string passed to PosixPath.format