Open utterances-bot opened 11 months ago
Hi thanks for the tutorial I am wondering how can i employ backbone101 for this since the library provides only resnet50. How can i customize it.
Hi @nattafahhm,
You can create a resnet101 backbone using the resnet_fpn_backbone
function from torchvision:
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
backbone = resnet_fpn_backbone('resnet101', pretrained=True)
model.backbone = backbone
You will probably need to train for longer and at a lower learning rate.
Hi, if i create a resnet101 backbone using the resnet_fpn_backbone, then i can use only imagenet1k, right? how can i use 21k
I am a new member. Thanks for the great tutorial. I want to ask what software is used to label images? Thanks
Hi @fah-iiv,
I don't believe torchvision has pretrained weights for the resnet101 model with ImageNet-21K. Are you referring to something from the timm library?
Hi @tuan-nmt,
There are free annotation tools like CVAT and automated annotation methods, as shown in the following videos:
I've received several questions about this, so I'll try to make time for a tutorial.
Hi Thank you for the tutorial. Could you please let me know how the json file should be when I want to implement this code for two classes? For example I have a dataset which contains cats and the cats have number of spots on their body. So I think the classes are cats and their spots. But I don’t know how should the json file looks like and which part of the code should be modified.
Hi @jetsonwork,
The answer to your question depends on what format you use for your dataset. The toy dataset used in the tutorial follows the annotation format for the LabelMe annotation tool. The tool's GitHub repository contains example annotations for instance-segmentation with multiple classes:
Thanks for your response.
I’m preparing a dataset for a Mask R-CNN model, involving images of cats and smaller, distinct spots on these cats. While the dataset has more instances of “spots” than “cats,” the latter covers a much larger area in the images. I’m concerned this might bias the model toward the “cat” class due to its larger pixel coverage.
My question is:
Could this difference in area coverage introduce significant training bias towards the “cat” class?
Hi @jetsonwork,
It could potentially introduce a training bias. However, I recommend getting to a point where you can iterate and experiment before worrying too much about that.
Thank you for the amazing tutorials Chris!
Hi, thanks a ton for the tutorial. But i am facing an issue while training the model. As I am using my own dataset that contains multiple instances in each images. So while training it's raising an error ~
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Training stopped at 15% of first epoch. In this case what could be the problem? what i have to change?
Hi @enamulrafti, The training code should work with images that contain multiple object instances. The toy dataset used in the tutorial contains images with more than one object, and I've used the code for other such datasets.
Would you mind providing more details from the RuntimeError
and the OS and hardware (e.g., CPU or GPU) running the training code?
Hi, thank you once again for the tutorial. I've successfully implemented the MASK-RCNN model following your guide. I have a question regarding the pretraining of MASK-RCNN: Is it possible to train the model with a certain set of classes and then fine-tune it on a different set of classes? For example, could I initially train the model on categories of cat species on "public dataset" and later fine-tune it to recognize different species on my own "dataset"? In this scenario, when I try to continue training the model with the new set of classes, I find that I cannot proceed without adjusting some of the output layers to account for the change in class types.
Additionally, should I consider freezing some parameters during this process, such as setting 'param.requires_grad' to True or False? Your advice on how to approach this would be greatly appreciated."
Hi @fah-iiv, Are you looking to retain the trained classes from the public dataset when fine-tuning the model on your dataset? For example, if the public dataset contained 20 cat species, would you want to add new species but have the model still recognize the original 20?
I am trying to run this on my laptop, as I dont have current access to the computer i have with GOU until next week. I tried specifying the device as CPU, but I keep getting this error: RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU
The issue is happens when using enumerate(train_dataloader), I tried to modify the source file to force it to use the CPU, but then the same error happens somewhere else.
I also tried using the install command for LINUX (CPU) ( im using linux) to no avail.
what can I do to try to run the tutorial on my laptop?
Found the solution: Pin_memory needs to be set to False when creating the dataloaders: data_loader_params = { 'batch_size': bs, # Batch size for data loading 'num_workers': num_workers, # Number of subprocesses to use for data loading 'collate_fn': lambda batch: tuple(zip(*batch)), 'pin_memory': False, 'pin_memory_device': device }
Hi @joekeo,
Sorry about that. You are correct that you must turn off the pin_memory
settings for the DataLoaders. I updated the code for some of my other tutorials to handle this automatically (link), but it appears I forgot to push the update for this one.
I'll update this tutorial when I have a chance, but for now, here is the new DataLoader initialization code so you don't need to change it manually when you get your GPU:
# Set the training batch size
bs = 4
# Set the number of worker processes for loading data. This should be the number of CPUs available.
num_workers = multiprocessing.cpu_count()
# Define parameters for DataLoader
data_loader_params = {
'batch_size': bs, # Batch size for data loading
'num_workers': num_workers, # Number of subprocesses to use for data loading
'persistent_workers': True, # If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the worker dataset instances alive.
'pin_memory': 'cuda' in device, # If True, the data loader will copy Tensors into CUDA pinned memory before returning them. Useful when using GPU.
'pin_memory_device': device if 'cuda' in device else '', # Specifies the device where the data should be loaded. Commonly set to use the GPU.
'collate_fn': lambda batch: tuple(zip(*batch)),
}
# Create DataLoader for training data. Data is shuffled for every epoch.
train_dataloader = DataLoader(train_dataset, **data_loader_params, shuffle=True)
# Create DataLoader for validation data. Shuffling is not necessary for validation data.
valid_dataloader = DataLoader(valid_dataset, **data_loader_params)
# Print the number of batches in the training and validation DataLoaders
print(f'Number of batches in train DataLoader: {len(train_dataloader)}')
print(f'Number of batches in validation DataLoader: {len(valid_dataloader)}')
Thanks for the help. It was a futile attempt as after fixing it, the estimated training time in my CPU is ~100 days, so will have to run it next week in the GPU.
@joekeo Don't forget you can run it on a GPU with the free tier of Google Colab:
how can i handle class imbalance in mask rcnn?
Hi @nattafahhm,
I'd need more information about your current dataset, the feasibility of gathering more data samples, and your comfort level with doing outside research to modify the existing training code before giving specific recommendations.
However, the most straightforward approaches would be to oversample the underrepresented classes, undersample the overrepresented ones, or add new samples.
Oversampling introduces the risk of overfitting on those samples, which you can partially mitigate with data augmentation like those currently in the tutorial. A simple implementation would be duplicating the image and annotation files for the underrepresented classes.
Undersampling would mean not using all available samples in your dataset, which might prevent the model from seeing some required scenarios. A simple implementation would be to remove some images and associated annotation files from the overrepresented classes.
Adding more data would address the potential drawbacks of over and undersampling. However, this might be infeasible depending on the type and quantity of data required. That said, the next series of tutorials I have planned will demonstrate methods to streamline this process using automated annotation and synthetic data generation for object detection and instance segmentation tasks.
You could also try combining the three methods (e.g., do a small amount of oversampling, a small amount of undersampling, and add a small amount of new data) to try and balance the drawbacks of each.
Before going through the hassle of any of these methods, I'd try training the model with your existing dataset to see whether the current imbalance is a significant issue.
Thanks CJ-Mills for this tutorial. I am looking for Google Collab Tutorial for Mask R CNN (Object Detection) but couldn't find one. Can you please do one?
Hi @AliAdibArnab9,
It's in the Tutorial Code
dropown in the Getting Started with the Code
section:
Open In Colab: https://colab.research.google.com/github/cj-mills/pytorch-mask-rcnn-tutorial-code/blob/main/notebooks/pytorch-mask-r-cnn-training-colab.ipynb
Tutorial Section: https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/#getting-started-with-the-code
If you were looking for the tutorial for getting started with Colab, here is the link for that: https://christianjmills.com/posts/google-colab-getting-started-tutorial/
Thank you so much @cj-mills. It's a very details tutorial. I am just bit confused how I can implement my dataset with this code. I have a dataset and annotation (.json file). Should I change the whole code or is there any way I can modify this code and get result?
@AliAdibArnab9 Do you happen to know what annotation format your dataset uses? If it's a single JSON file for the whole dataset, my first guess would be it's in COCO format.
If so, I have a tutorial showing how to work with COCO segmentation annotations (the type you would use with Mask R-CNN) in PyTorch:
You could use that tutorial as a guide for how to modify the code in the Mask R-CNN tutorial, which uses the LabelMe annotation format.
Hi Christian, just have a question with the annotation. Is the each .json file is 'single file in VGG JSON format' or 'single file in COCO JSON format'?
@AliAdibArnab9 Both VGG and COCO tend to use a single JSON file. Check out the examples at the links below to see which format you have:
Sorry again Christian. Seems like your annotation and my annotation gives me different result. I used makesense ai and downloaded single .json VGG file after annotating. When I open my annotation in a notebook I can see it is different from yours. Can you please tell me by which tool you did the annotation and got the .json file for each images?
(For example in your annotation file you have image_path, I dont have any such thing and thats why its showing me error)
@AliAdibArnab9 I don't have a tutorial for working with VGG annotations, so that would explain the difference in results. This Mask R-CNN tutorial uses the LabelMe annotation format. I currently also have tutorials covering how to work with segmentation annotations in COCO and CVAT format, but not VGG.
Maksense.ai lets you export polygon annotations in COCO format in addition to VGG, so you can simply select the Single file in COCO JSON format
option this time.
Hi CJ, Really great and comprehensive article! It is really rare to see a tutorial that include the env setup :)
On your class StudentIDDataset(Dataset):
why do you have to convert the image into rgb using image = Image.open(filepath).convert('RGB')
?
I'm working on satellite imagery, some of them are not necessarily RGB.
Hi @amrirasyidi, The dataset class converts the images to RGB because that is what the Mask R-CNN model expects. What format are your images?
I see Mine is RGBA.
In case the image is already in RGB, it should be okay to just use Image.open(filepath)
, right?
Anyway, I have another question.
In the json file, the shapes
part, if I have 3 student ids in the same image, when I go to the json file and ctrl+f
for "student_id", I should be seeing 3 result of it right? i.e. the shapes
part should contain all the polygons of the mask
Hi Christian, thank you for your tutorial! I have seen questions here regarding the conversion of a json file for the whole dataset to single image json annotations file. I wrote a brief code that work pretty well for conversion, just need to copy the annotation path from the json file for the whole dataset and set up the destination folder and let it run. Here is the github repository
@amrirasyidi
You are correct that it is alright to use Image.open(filepath)
instead of Image.open(filepath).convert('RGB')
when you know all the images are RGB.
Regarding your second question, your understanding is also correct. If you have an image with three annotated objects, the "shapes" section in the corresponding JSON file will store the polygon information for all three objects.
Example:
"shapes": [
{
"label": "student_id",
"line_color": null,
"fill_color": null,
"points": [...
],
"shape_type": "polygon",
"flags": {}
},
{
"label": "student_id",
"line_color": null,
"fill_color": null,
"points": [...
],
"shape_type": "polygon",
"flags": {}
},
{
"label": "student_id",
"line_color": null,
"fill_color": null,
"points": [...
],
"shape_type": "polygon",
"flags": {}
}
],
You can see a direct comparison between a LabelMe JSON segmentation file and the resulting pandas DataFrame and annotated image in the tutorial linked below:
@Bombardelli Thanks for sharing!
Do you have a tutorial on how to implement inference with a webcam with the model from this tutorial in a similar way that YOLO does?
I tried several ways but it's simply not working:
import cv2
import torch
from torchvision.models.detection import maskrcnn_resnet50_fpn_v2
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from torchvision.transforms import functional as F
def get_model(num_classes):
# Load a pre-trained Mask R-CNN model
model = maskrcnn_resnet50_fpn_v2(weights='DEFAULT')
# Get the number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# Replace the pre-trained head with a new one (adjust number of classes)
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# Do the same for the mask predictor if your task involves instance segmentation
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256 # Typically the size of the hidden layer used in Mask R-CNN
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes)
return model
# Example: Adjust for your specific number of classes (e.g., 16 classes + background)
num_classes = 17 # Including the background class
model = get_model(num_classes)
# Load the model state dictionary
model.load_state_dict(torch.load('path_to_saved_model_state.pth'))
model.eval() # Set the model to inference mode
# Webcam feed
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
# Convert to tensor
image = F.to_tensor(frame).unsqueeze(0)
with torch.no_grad():
predictions = model(image)
# Post-process predictions and visualize results
# This is left as an exercise depending on how you want to display the results.
# For simplicity, we're just displaying the original webcam feed here.
cv2.imshow('Webcam Live Inference', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
@Bombardelli, I don't have a tutorial for that specifically, but you can probably make what you need using the following tutorials as a reference:
You should be able to swap the inference steps from the Mask R-CNN ONNX export tutorial into the while
loop from the object tracking tutorial.
That said, I'm not sure the inference speed for the Mask R-CNN model will be fast enough for real-time inference from a webcam without further optimization or a sufficiently powerful GPU.
I'm getting this error:
Can't pickle <function
Cell In[35], line 24, in run_epoch(model, dataloader, optimizer, lr_scheduler, device, scaler, epoch_id, is_training) 21 progress_bar = tqdm(total=len(dataloader), desc="Train" if is_training else "Eval") # Initialize a progress bar 23 # Loop over the data ---> 24 for batch_id, (inputs, targets) in enumerate(dataloader): 25 # Move inputs and targets to the specified device 26 inputs = torch.stack(inputs).to(device)
Hi @waqarorakzai,
Are you running the tutorial code on Windows? If so, download the Windows notebook and the associated utility file using the following links:
Python multiprocessing works differently in Windows versus Linux, so the code requires a few tweaks.
Hi @christian. thanks again for the tutorial. Is there any way to calculate AP, accuracy and confusion matrix from the code? because for calculating that the traditional libraries are more using on cocostyle format .json
Do you have a specific reason on choosing PIL
over cv2
?
Hi @amrirasyidi,
PIL just ended up being my default for projects. Torchvision also has convenience functions for converting between PIL Images and PyTorch tensors.
Hi @AliAdibArnab9,
Sorry, I missed your question. You could probably use the same approach from this official PyTorch tutorial for calculating average precision:
It's not super helpful for the toy dataset used in my tutorial, but here is a quick example of the training notebook using the same evaluation code:
Training Mask R-CNN on custom data, but the training doesn't stop and produces no output or errors Here's a brief overview of my process:
1.I generated a dataset using PyTorch by applying the SAM mask from bounding boxes to my images. 2.After creating the dataset, I split it into training and testing sets. 3.I loaded both sets using torch.utils.data.DataLoader. 4.I'm using a pre-trained model with 11 classes.
this is the output of my dataset Any help or insights would be greatly appreciated.
Hi @MontassarTn, Are you using this tutorial's training code? It appears from the screenshot you might be using something else. If so, I can't provide much insight without seeing your code.
I am running this code on my personal computer with windows. I didn't change anything in the code. I get the following error at the end of the first epoch of training:
Loss is NaN or infinite at epoch 0, batch 0. Stopping training.
I checked the loss and the loss is NaN. The code until the training works well and gives the correct outputs. Is there anyone who knows how to fix this?
Hi @MontassarTn, Are you using this tutorial's training code? It appears from the screenshot you might be using something else. If so, I can't provide much insight without seeing your code.
@cj-mills no I didn't, could I send to you my code?
Hi @EnesAgirman,
Are you using the Windows version of the training notebook with its associated utility file?
I set up a fresh conda environment (with CUDA 11.8) using the steps in the tutorial this morning and verified the Windows notebook successfully finished training.
@MontassarTn To be honest, I have very little spare time at the moment and would likely not even have a chance to go through it in the near term.
Also, these comment sections are for questions related to their associated tutorials, and I do not want to set a precedent of expanding that scope too much. It would simply be infeasible for me to address such a range of requests.
If you want to try using your dataset with this training code, I have tutorials on working with segmentation annotations in a few different formats in PyTorch.
Christian Mills - Training Mask R-CNN Models with PyTorch
Learn how to train Mask R-CNN models on custom datasets with PyTorch.
https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/