Open utterances-bot opened 1 year ago
Hi. I am actually new to machine learning and following your tutorials for a project. I just wanted to confirm that I can export the model to ONXX by following the next tutorial and then I can use it in Unity using the Barracuda Packages for yolox that you have developed or by following the tutorial (Real-Time Object Detection in Unity With ONNX and DirectML Pt. 1). Am I right?
Hi @aforadil,
That's correct. You can export the model with the code in the follow-up tutorial. Use the specified settings for Barracuda in this section if you plan to use the model with either the ONNX-DirectML tutorial or the Barracuda YOLOX package.
You can swap the model and colormap file in the Barracuda YOLOX demo project without any additional changes.
The code in the ONNX-DirectML tutorial applies normalization to the model input, so you'll need to remove the normalization steps (i.e., the subtraction and division operations) from the PerformInference
function in the dllmain.cpp
file.
Also, use the TensorFlow.js plugin if you need to run the model in a browser. The model runs with Barracuda in WebGL builds, but the inference speed is slower and has weird glitches when changing input dimensions.
Thanks for a quick response @cj-mills.
Hi @cj-mills,
I was following your tutorial. The code runs fine in Collab. When I tried running it in a Local Python Environment with Mamba, following error comes up in Prompt when the "Train the Model" cell is executed. I have tried executing all in a sequence but still error persists. Looks like some issue with multiprocessing. Would be great if you can help out in this. Thanks
Error:
[I 17:51:29.357 NotebookApp]
Traceback (most recent call last):
File "
Hi @aforadil,
Python Multiprocessing on Windows is slightly weird and even more so with Jupyter notebooks.
The fix for the AttributeError
is to place the HagridDataset
class in a separate Python file and import it. I just added a version of the training notebook for Windows to the GitHub repository.
I'll add a note to the tutorial to use that one when running on Windows.
Edit: Also download the windows_utils.py
file and place it in the same folder as the notebook.
Great. Thanks.
Hi... can't i just use the standard coco annotation json file to train my custom model? As my model is having images with multiple classes, and i noticed that yours is pointing each json file to a class folder. Thanks in advance :)
By the way, following previous question, i'm already having a custom model trained on Yolov8. however, when i tried to use it with Unity (after exporting it to Onnx), it gaves me some Reshape issue. but when i used your model, it was working perfectly. that's why i want to train my custom data with yolox. I just mention this, as you might having an answer for my main issue :)
Hi @Xantango,
For your first question, you can use whichever annotation format you wish, but you'll need to update the relevant code in the notebook for that specific format. The method to read the annotation data used in the tutorial is simply what the HaGRID dataset requires.
I have a notebook in this tutorial's GitHub Repository for working with the COCO dataset. I need to clean it up slightly, but the code is functional.
I am debating whether the COCO notebook warrants a dedicated post to cover the changes in how to read the annotation data.
As for your second question, I assume you are trying to use the model with Unity's Barracuda inference library. Barracuda has limited ONNX operator support, and part of the exported YOLOv8 model likely has unsupported operators. That is why I have different export settings for Barracuda in the follow-up post.
Sentis, the replacement for Barracuda, should have better operator support. Unfortunately, Sentis is only available through a closed beta program, and I only got access this week. I plan to port my unity-barracuda-inference-yolox package to Sentis to compare against Barracuda, but I have not had a chance to do so yet.
Hi @cj-mills, Very informative. Thank you sooo much.
Hi @cj-mills, This tutorial's code and pytorch-yolox-object-detector-training-coco.ipynb are running perfectly fine on colab. However, when i ran them on Jupyter, including pytorch-yolox-object-detector-training-windows.ipynb, the training section keep running forever, with neither errors nor progress. Any idea?
Uhhh, it was the attribute issue
@Xantango
Were you able to resolve the issue you encountered? I don't have a version of the COCO notebook prepared for Windows, so you would need to make the same changes as in the notebook for the HaGRID dataset and place the windows_utils.py
file in the same folder.
Hi @cj-mills, yes, i have combined parts from this tutorial, COCO version, and windows version, to suit what i needed. and it is working perfectly fine. thank you soooo much for the great well explained tutorial.
Hello , After Runing the program in Jupyter i have this issue after runing dataset_sample = train_dataset[0]
annotated_tensor = draw_bboxes( image=(denorm_img_tensor(dataset_sample[0], norm_stats)255).to(dtype=torch.uint8), boxes=dataset_sample[1]['boxes'], labels=[class_names[int(i.item())] for i in dataset_sample[1]['labels']], colors=[int_colors[int(i.item())] for i in dataset_sample[1]['labels']] )
tensor_to_pil(annotated_tensor)
"NameError Traceback (most recent call last) Cell In[32], line 1 ----> 1 dataset_sample = train_dataset[0] 3 annotated_tensor = draw_bboxes( 4 image=(denorm_img_tensor(dataset_sample[0], norm_stats)255).to(dtype=torch.uint8), 5 boxes=dataset_sample[1]['boxes'], 6 labels=[class_names[int(i.item())] for i in dataset_sample[1]['labels']], 7 colors=[int_colors[int(i.item())] for i in dataset_sample[1]['labels']] 8 ) 10 tensor_to_pil(annotated_tensor)
File ~\YOLOX\windows_utils.py:61, in HagridDataset.getitem(self, index) 59 annotation = self._annotation_df.loc[img_key] 60 # Load the image and its target (bounding boxes and labels) ---> 61 image, target = self._load_image_and_target(annotation) 63 # Apply the transformations, if any 64 if self._transforms:
File ~\YOLOX\windows_utils.py:89, in HagridDataset._load_image_and_target(self, annotation) 87 bbox_tensor = torchvision.ops.box_convert(torch.Tensor(bbox_list), 'xywh', 'xyxy') 88 # Create a BoundingBox object with the bounding boxes ---> 89 boxes = BoundingBox(bbox_tensor, format='xyxy', canvas_size=image.size[::-1]) 90 # Convert the class labels to indices 91 labels = torch.Tensor([self._class_to_idx[label] for label in annotation.labels])
NameError: name 'BoundingBox' is not defined"
Hi @Zafoue,
Sorry about that. I must have forgotten to push the updated windows_utils.py
file for torchvision 0.16+
to GitHub. I have done so now, and you can download it from the link below:
Hi @cj-mills
I managed to get it to work on Jupyter .
Do you have a tutorial for creating a custom dataset?
Hi @Zafoue,
I have not made a tutorial for creating custom datasets yet. I've received several questions about this, so I'll try to make time for one.
If you need something now, there are free annotation tools like CVAT and automated annotation methods, as shown in the following videos:
Hi @cj-mills ,
I use RoboFlow for labeling images, but my question is, in which format do I need to export? I looked at your dataset "hagrid-sample-30k-384p," and I noticed there's one labeling file for a set of images. I've tried the yolov8 format and some others, but I always end up with one labeling file per image. As a result, I have multiple labeling files, which doesn't match the format you use in the scripts.
Could you advise me on the correct export format or guide me on how to get a single labeling file for a set of images?
Thank you very much!
Christian, what a beast of a tutorial! I would like to reproduce this on my own data. How were you able to become so proficient at this ? Would you be able to recommend some learning resources for a python noob?
My background is electrical engineering. Decent proficiency in Matlab, not much Python.
I am thinking :
There is just so much learning material out there, I'm not sure where to start and how to be efficient about it. My immediate interest is in fine tuning object detection models on my own data (as you have done here). If you have any suggestions, it would be greatly appreciated.
Hi @ryan-michaud,
The fast.ai courses are my go-to recommendation for getting started with deep learning. If you have some existing coding experience, you can probably make it through without already being proficient in Python.
If you want to build a foundation with Python first, I don't have a default tutorial to recommend (there are so many these days). However, the latest CS50x course is probably a safe bet:
As for the fast.ai courses, part 2 is optional but highly recommended. It is valuable even just from a Python development standpoint.
I recommend having a project to work on while going through the fast.ai course. You seem to have an object detection project in mind, so that's good.
The latest iteration of the courses doesn't go in-depth on object detection specifically. However, an earlier version does:
As for your project, I made some tutorials recently for working with various bounding box annotations in PyTorch, which you might find applicable:
If you would like to explore the YOLOX model specifically, you can view the code for my pip package in the following GitHub repository:
Above all, don't feel you need to wait until you think you have enough theoretical knowledge to start working on your project. Learn what you need as you need to.
Afterward, if you want to test your understanding, make a tutorial. Teaching others how to do something does wonders for identifying any relevant gaps in your knowledge. 😉
@cj-mills Thank you for such a detailed and thoughtful reply. I hope to give back one day as you have ;-)
@ryan-michaud You're very welcome! Best of luck with your project!
Hi @cj-mills,there is a version of this project for Windows on Github called pytorch-yolox-object-detector-training-windows.ipynb, how can I modify it to run it on Linux?Do I need to change just this line? "from windows_utils import hagriddataset" or something else?Thank you)
Hi @Boo2z,
There is already a Linux version of the notebook available for this tutorial. You can find links for the available notebooks in the Tutorial Code
dropdown under the Getting Started with the Code
section:
Hi @cj-mills. I plan to train the model using this notebook, and then use my trained model in this sentis demo, only roll back the Unity version to 2022, and leave Sentis at version 1.3. After Unity moved to 2022, your project worked well. Should my trained model work without problems? Or theoretically there could be difficulties? Thank you in advance.
Hi @OlegKochetkov-git, I believe Sentis 1.3 (and the new 1.4) only officially supports Unity 2023.2 or newer:
Thanks for your great tutorial. May I know which need to change if we want use datasets from roboflow? Pascal VOC?
Hi @hericah,
If your dataset is currently in roboflow, you can export the annotations to COCO JSON format. You can also use roboflow to convert existing annotations from Pascal VOC to COCO. From there, you can check out how to work with COCO bounding box annotations in the tutorial linked below:
I got this error when running it on M1 Mac
{
"name": "PicklingError",
"message": "Can't pickle <function <lambda> at 0x16da18c10>: attribute lookup <lambda> on __main__ failed",
"stack": "---------------------------------------------------------------------------
PicklingError Traceback (most recent call last)
Cell In[39], line 1
----> 1 train_loop(
2 model=model,
3 train_dataloader=train_dataloader,
4 valid_dataloader=valid_dataloader,
5 optimizer=optimizer,
6 loss_func=yolox_loss,
7 lr_scheduler=lr_scheduler,
8 device=torch.device(device),
9 epochs=epochs,
10 checkpoint_path=checkpoint_path,
11 use_scaler=True)
Cell In[34], line 36, in train_loop(model, train_dataloader, valid_dataloader, optimizer, loss_func, lr_scheduler, device, epochs, checkpoint_path, use_scaler)
33 # Loop over the epochs
34 for epoch in tqdm(range(epochs), desc=\"Epochs\"):
35 # Run a training epoch and get the training loss
---> 36 train_loss = run_epoch(model, train_dataloader, optimizer, lr_scheduler, loss_func, device, scaler, epoch, is_training=True)
37 # Run an evaluation epoch and get the validation loss
38 with torch.no_grad():
Cell In[33], line 24, in run_epoch(model, dataloader, optimizer, lr_scheduler, loss_func, device, scaler, epoch_id, is_training)
21 progress_bar = tqdm(total=len(dataloader), desc=\"Train\" if is_training else \"Eval\") # Initialize a progress bar
23 # Loop over the data
---> 24 for batch_id, (inputs, targets) in enumerate(dataloader):
25 # Move inputs and targets to the specified device
26 inputs = torch.stack(inputs).to(device)
27 # Extract the ground truth bounding boxes and labels
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/site-packages/torch/utils/data/dataloader.py:440, in DataLoader.__iter__(self)
438 return self._iterator
439 else:
--> 440 return self._get_iterator()
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/site-packages/torch/utils/data/dataloader.py:388, in DataLoader._get_iterator(self)
386 else:
387 self.check_worker_number_rationality()
--> 388 return _MultiProcessingDataLoaderIter(self)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/site-packages/torch/utils/data/dataloader.py:1038, in _MultiProcessingDataLoaderIter.__init__(self, loader)
1031 w.daemon = True
1032 # NB: Process.start() actually take some time as it needs to
1033 # start a process and pass the arguments over via a pipe.
1034 # Therefore, we only add a worker to self._workers list after
1035 # it started, so that we do not call .join() if program dies
1036 # before it starts, and __del__ tries to join but will get:
1037 # AssertionError: can only join a started process.
-> 1038 w.start()
1039 self._index_queues.append(index_queue)
1040 self._workers.append(w)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/multiprocessing/process.py:121, in BaseProcess.start(self)
118 assert not _current_process._config.get('daemon'), \\
119 'daemonic processes are not allowed to have children'
120 _cleanup()
--> 121 self._popen = self._Popen(self)
122 self._sentinel = self._popen.sentinel
123 # Avoid a refcycle if the target function holds an indirect
124 # reference to the process object (see bpo-30775)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/multiprocessing/context.py:224, in Process._Popen(process_obj)
222 @staticmethod
223 def _Popen(process_obj):
--> 224 return _default_context.get_context().Process._Popen(process_obj)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/multiprocessing/context.py:288, in SpawnProcess._Popen(process_obj)
285 @staticmethod
286 def _Popen(process_obj):
287 from .popen_spawn_posix import Popen
--> 288 return Popen(process_obj)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/multiprocessing/popen_spawn_posix.py:32, in Popen.__init__(self, process_obj)
30 def __init__(self, process_obj):
31 self._fds = []
---> 32 super().__init__(process_obj)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/multiprocessing/popen_fork.py:19, in Popen.__init__(self, process_obj)
17 self.returncode = None
18 self.finalizer = None
---> 19 self._launch(process_obj)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/multiprocessing/popen_spawn_posix.py:47, in Popen._launch(self, process_obj)
45 try:
46 reduction.dump(prep_data, fp)
---> 47 reduction.dump(process_obj, fp)
48 finally:
49 set_spawning_popen(None)
File /opt/homebrew/Caskroom/miniconda/base/envs/pytorch-env/lib/python3.10/multiprocessing/reduction.py:60, in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
PicklingError: Can't pickle <function <lambda> at 0x16da18c10>: attribute lookup <lambda> on __main__ failed"
}
Hi @hericah,
The issue stems from the difference between how multiprocessing in Python works on Linux and macOS.
Alternatively, you can set the value for num_workers
in data_loader_params
to 0
to disable multiprocessing.
Hi Christian and thanks for your excellent tutorial and support to the community. You're the best !!! I have a question and I hope you can help me please.
I'm following your tutorial, using a custom dataset generated with labelme (I followed your tutorial Working with LabelMe Bounding Box Annotations in Torchvision). But I have an error when running the Inspect Samples (Inspect training set sample and Inspect validation set sample), I get:
AttributeError: 'Series' object has no attribute 'bboxes'.
I've been investigating and I think that error starts in the HagridDataset(Dataset) class, exactly in the command: bbox_list = np.array([bbox(image.size2) for bbox in annotation.bboxes]).
I checked your json file and saw that the 4 points of the bounding box are inside bboxes, while in my json file the 4 points of the bounding box are in shape -> points, I think that generates the error, but so far I haven't been able to solve it.
I hope you can help me and I'll be attentive, again thank you very much for your excellent tutorials!!
@cj-mills can u help me please 🙏
Hi @carlos-saldana,
The HaGRID dataset I used for this tutorial does not use the LabelMe annotation format.
If you want to adapt this training tutorial to work with a LabelMe dataset like in the Working with LabelMe Bounding Box Annotations in Torchvision tutorial, replace the HagridDataset class (and other code used for interacting with the HaGRID annotation data) with the corresponding code from the Labelme annotation tutorial.
I tried to keep the heading names consistent across the tutorials to make matching up the code for each section easier.
Hi @cj-mills, Thank you very much, I will review the code again!!
Hi @cj-mills!
I've followed your notebook for custom traning yolo-x models, after training I load the model and try to generate predictions on new image. But the model labels all the predictions wrong and some objects are not even being detected, for example trees.
Here is how I load and run inference on the finetuned model,
ckpt_fold = 'ckpts/yolox_m/2024-10-25_08-43-44/'
# ckpt_fold = './pretrained_checkpoints/'
model_type = 'yolox_m'
pretrained = True
model = build_model(model_type, 8, pretrained=pretrained, checkpoint_dir=ckpt_fold).to(device=device, dtype=dtype)
model.device = device
model.name = model_type
strides = model.bbox_head.strides
norm_stats = ([0.5]*3, [1.0]*3)
# Convert the normalization stats to tensors
mean_tensor = torch.tensor(norm_stats[0]).view(1, 3, 1, 1)
std_tensor = torch.tensor(norm_stats[1]).view(1, 3, 1, 1)
# Set the model to evaluation mode
model.eval();
# Wrap the model with preprocessing and post-processing steps
wrapped_model = YOLOXInferenceWrapper(model, mean_tensor, std_tensor).to(device=device)
test_img = Image.open('test_images/test_image_2.jpg')
train_sz = 512
bbox_conf_thresh = 0.20
iou_thresh = 0.25
draw_bboxes = partial(draw_bounding_boxes, fill=False, width=2, font_size=25)
class_names=classes_to_keep
# Resize image without cropping to multiple of the max stride
resized_img = resize_img(test_img, target_sz=train_sz, divisor=1)
# Calculating the input dimensions that multiples of the max stride
input_dims = [dim - dim % max(strides) for dim in resized_img.size]
# Calculate the offsets from the resized image dimensions to the input dimensions
offsets = (np.array(resized_img.size) - input_dims)/2
# Calculate the scale between the source image and the resized image
min_img_scale = min(test_img.size) / min(resized_img.size)
# Crop the resized image to the input dimensions
input_img = resized_img.crop(box=[*offsets, *resized_img.size-offsets])
input_tensor = transforms.Compose([transforms.ToImage(),
transforms.ToDtype(torch.float32, scale=True)])(input_img)[None].to(device)
wrapped_model.to(device)
with torch.no_grad():
model_output = wrapped_model(input_tensor).to('cpu')
# Filter the proposals based on the confidence threshold
max_probs = model_output[:, : ,-1]
mask = max_probs > bbox_conf_thresh
proposals = model_output[mask]
# Sort the proposals by probability in descending order
proposals = proposals[proposals[..., -1].argsort(descending=True)]
# Filter bouning box proposals using NMS
proposal_indices = torchvision.ops.nms(
boxes=torchvision.ops.box_convert(proposals[:, :-2], 'xywh', 'xyxy'),
scores=proposals[:, -1],
iou_threshold=iou_thresh
)
proposals = proposals[proposal_indices]
# Offset and scale the predicted bounding boxes
pred_bboxes = (proposals[:,:4]+torch.Tensor([*offsets, 0, 0]))*min_img_scale
pred_labels = [class_names[int(idx)] for idx in proposals[:,4]]
pred_probs = proposals[:,5]
annotated_tensor = draw_bboxes(
image=transforms.PILToTensor()(test_img),
boxes=torchvision.ops.box_convert(torch.Tensor(pred_bboxes), 'xywh', 'xyxy'),
labels=[f"{label}\n{prob*100:.2f}%" for label, prob in zip(pred_labels, pred_probs)],
colors=[int_colors[class_names.index(i)] for i in pred_labels]
)
Hi @moelk-atea,
The checkpoint_dir
parameter for the build_model
method is only for indicating where to download the pretrained model weights. It is not for loading finetuned model checkpoints. You can view the source code for the build_model
method at the link below:
Checkout the following section from a follow-up tutorial to see the steps for initializing a YOLOX model with the saved checkpoint from a finetuning session:
Christian Mills - Training YOLOX Models for Real-Time Object Detection in Pytorch
Learn how to train YOLOX models for real-time object detection in PyTorch by creating a hand gesture detection model.
https://christianjmills.com/posts/pytorch-train-object-detector-yolox-tutorial/