Open susanin1970 opened 3 years ago
I have the same problem,and how do you solve it?
No, I didn't solve it yet, but I'm going to solve in the future Now I try to use Detectron2 library for this task
I have experienced the same issue when training with datasets containing many (>200) small objects in the same image. The training begins but eventually, the same opencv error is returned. Unfortunately, I have not found a solution.
Hi @susanin1970, @fanweiya, @tehkillerbee I am planning to work on something similar. Is there a workaround?
@susanin1970 did the Detectron2 library work for you?
Thank you!
@InvincibleKnight I ended up using mmdetection and was able to train and run inference on a dataset containing many small objects (although I also had some challenges)
@InvincibleKnight, hi! Detectron2 library was worked for me, but there were some nuances
I use config COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml
for training
I changed this config in the following way:
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("gorokh_train",) # here is the example of name of train set
cfg.DATASETS.TEST = ("gorokh_val",) # here is the example of name of test set
cfg.DATALOADER.NUM_WORKERS = 2
# Let training initialize from model zoo
# cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml")
cfg.MODEL.WEIGHTS = "/content/drive/MyDrive/trash/model_0004999.pth"
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.SOLVER.BASE_LR = 0.0001
cfg.SOLVER.MAX_ITER = 30000
cfg.SOLVER.STEPS = []
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 # default: 512
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.TEST.DETECTIONS_PER_IMAGE = 3000
I trained Mask-RCNN on two different datasets One of them contained marked frames in CVAT, which are identical to those shown in the first post, one frame contains about 1000 objects (in fact, it's peas) Another dataset contains marked frames in CVAT, which contain several dozen objects maximum ( e.g. cat litter)
I trained Mask-RCNN on ~30k epochs in Google Colab (but I recetnly able to install Detectron2 on Win10) in cat litter dataset, and inferenced this model on the test frames and test video Mask-RCNN segmented almost all objects in frames:
But inference on video lasted ~10-15 minutes
And I measured the processing time of one frame. It took an average of 1500 milliseconds, and I think,
that this is clearly not enough for real-time work for example
Maybe measurements were not entirely correct, because I measured processing time with help time
module in python
For training Mask-RCNN in peas dataset I use several iterations in Colab, in each iteration was ~30k epochs And results were not as good, s barely 1/5 of all objects was egmented and inference time on single image was 1700-1800 milliseconds:
Fewer objects were initially segmented, but I noticed, that the value of cfg.TEST.DETECTIONS_PER_IMAGE
is 100 by default, but change of this parameter upwards (e. g. 1000, 2000, 3000) did not improve the situation
@tehkillerbee , hi! How many objects contain in the frames of your dataset?
@InvincibleKnight, hi! Detectron2 library was worked for me, but there were some nuances
I use config
COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml
for training I changed this config in the following way:cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("gorokh_train",) # here is the example of name of train set cfg.DATASETS.TEST = ("gorokh_val",) # here is the example of name of test set cfg.DATALOADER.NUM_WORKERS = 2 # Let training initialize from model zoo # cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml") cfg.MODEL.WEIGHTS = "/content/drive/MyDrive/trash/model_0004999.pth" cfg.SOLVER.IMS_PER_BATCH = 1 cfg.SOLVER.BASE_LR = 0.0001 cfg.SOLVER.MAX_ITER = 30000 cfg.SOLVER.STEPS = [] cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 # default: 512 cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 cfg.TEST.DETECTIONS_PER_IMAGE = 3000
I trained Mask-RCNN on two different datasets One of them contained marked frames in CVAT, which are identical to those shown in the first post, one frame contains about 1000 objects (in fact, it's peas) Another dataset contains marked frames in CVAT, which contain several dozen objects maximum ( e.g. cat litter)
I trained Mask-RCNN on ~30k epochs in Google Colab (but I recetnly able to install Detectron2 on Win10) in cat litter dataset, and inferenced this model on the test frames and test video Mask-RCNN segmented almost all objects in frames:
But inference on video lasted ~10-15 minutes And I measured the processing time of one frame. It took an average of 1500 milliseconds, and I think, that this is clearly not enough for real-time work for example Maybe measurements were not entirely correct, because I measured processing time with help
time
module in pythonFor training Mask-RCNN in peas dataset I use several iterations in Colab, in each iteration was ~30k epochs And results were not as good, s barely 1/5 of all objects was egmented and inference time on single image was 1700-1800 milliseconds:
Fewer objects were initially segmented, but I noticed, that the value of
cfg.TEST.DETECTIONS_PER_IMAGE
is 100 by default, but change of this parameter upwards (e. g. 1000, 2000, 3000) did not improve the situation
hi, I am interest in Detectron2 as well, is it available for customs dataset? Thank you
@tehkillerbee which model from mmdetection did you use to get nice result?
@alexeybozhchenko I ended up using Mask RCNN but I had to tweak some parameters to detect a larger number of objects. Specifically,
edit configs/_base_/models/mask_rcnn_r50_fpn.py
...
max_per_img=100 => max_per_img=1000
Additionally my model struggled with very small objects. This is due to the anchor scale used for COCO.
Anchor scale is calculated as anchor_scales anchor_base_sizes, if anchor_base_sizes is not set, anchor_strides is used by default. If anchor_scales=[8] and anchor_strides=[4, 8, 16, 32, 64], then anchor scales for each fpn level are calculated as [84, 88, 816, 832, 864].
edit configs/_base_/models/mask_rcnn_r50_fpn.py
...
You just need to modify anchor_scales=[8] to anchor_scales=[4].
See https://github.com/open-mmlab/mmdetection/issues/90 for more details
Please note that detecting this many objects require quite a bit of GPU RAM. I am using a Jetson AGX Xavier with 32GB RAM (shared between CPU/GPU)
@susanin1970 Sorry for late reply, I detect around 1000 in my dataset. Each frame is scaled to 768x768.
The inference speed depends on the number of objects but it is usually fast enough for my requirements (1000mS or less)
@InvincibleKnight, hi! Detectron2 library was worked for me, but there were some nuances I use config
COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml
for training I changed this config in the following way:cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("gorokh_train",) # here is the example of name of train set cfg.DATASETS.TEST = ("gorokh_val",) # here is the example of name of test set cfg.DATALOADER.NUM_WORKERS = 2 # Let training initialize from model zoo # cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml") cfg.MODEL.WEIGHTS = "/content/drive/MyDrive/trash/model_0004999.pth" cfg.SOLVER.IMS_PER_BATCH = 1 cfg.SOLVER.BASE_LR = 0.0001 cfg.SOLVER.MAX_ITER = 30000 cfg.SOLVER.STEPS = [] cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 # default: 512 cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 cfg.TEST.DETECTIONS_PER_IMAGE = 3000
I trained Mask-RCNN on two different datasets One of them contained marked frames in CVAT, which are identical to those shown in the first post, one frame contains about 1000 objects (in fact, it's peas) Another dataset contains marked frames in CVAT, which contain several dozen objects maximum ( e.g. cat litter) I trained Mask-RCNN on ~30k epochs in Google Colab (but I recetnly able to install Detectron2 on Win10) in cat litter dataset, and inferenced this model on the test frames and test video Mask-RCNN segmented almost all objects in frames: But inference on video lasted ~10-15 minutes And I measured the processing time of one frame. It took an average of 1500 milliseconds, and I think, that this is clearly not enough for real-time work for example Maybe measurements were not entirely correct, because I measured processing time with help
time
module in python For training Mask-RCNN in peas dataset I use several iterations in Colab, in each iteration was ~30k epochs And results were not as good, s barely 1/5 of all objects was egmented and inference time on single image was 1700-1800 milliseconds: Fewer objects were initially segmented, but I noticed, that the value ofcfg.TEST.DETECTIONS_PER_IMAGE
is 100 by default, but change of this parameter upwards (e. g. 1000, 2000, 3000) did not improve the situationhi, I am interest in Detectron2 as well, is it available for customs dataset? Thank you
Sorry for late reply. Yes, there is a possibility of use Detectron2 for detection/segmentation of custom objects
@susanin1970 Sorry for late reply, I detect around 1000 in my dataset. Each frame is scaled to 768x768.
The inference speed depends on the number of objects but it is usually fast enough for my requirements (1000mS or less)
Thank you for this detailed comment! I will try to use MMDetection for segmentation small objects based on your experience
if number of objects at the image is more than 512 open cv resize method crushes. To solve this problem modify the line
masks = cv2.resize(masks, (width, height))
at /yolact-master/utils/augmentations.py, (~ line 162):
cv_limit = 512
if masks.shape[2] <= cv_limit:
masks = cv2.resize(masks, (width, height))
else:
# split masks array on batches with max size 512 along channel axis, resize and merge them back
masks = np.concatenate([cv2.resize(masks[:, :, i:min(i + cv_limit, masks.shape[2])], (width, height))
for i in range(0, masks.shape[2], cv_limit)], axis=2)
Hey @tehkillerbee, happy to find you in this issue. We have been in contact on an MMDeploy Github issue regarding the deployment of MMDetection Mask-RCNN models on the Jetson.
I essentially would like to find out if and to what extend I need to retrain the Mask-RCNN (COCO pretrained) model if I want to use different image scales and anchor sizes than the default in MMDetection.
Reading through this issue, I have a question that you may be able to answer to me. I want to use a 3.1MP camera (2064x1544) together with the Mask-RCNN model from MMDetection. I realized that every image gets rescaled for the dimensions to be within a range of [800, 1333] before training/inference. However, I would really like to use the full resolution that my camera offers!
I therefore changed the img_scale parameter on this line and this line to img_scale = (2064, 1544)
. (Is that the correct way to do it?)
Before realizing the resizing that MMDetection does, I have simply taken the pretrained COCO weights and did some transfer learning for the network to detect 7 classes of litter (keeping the resizing to the default) using the TACO dataset and some own images (in total about 1600). I then exported the model to TensorRT and realized that the inference performs really bad on my non-rescaled 3.1MP images but very well on images rescaled down to [800, 1333]. I am now obviously trying to achieve that performance for my 3.1MP images without rescaling!
So, simply changing the img_scale
on my already trained model (that has only seen rescaled images) for inference does not work. The guys from MMDetection have suggested to me to change the anchor size by changing this line from scales=[8]
to scales=[12]
or even scales=[16]
. It does not work either.
I therefore thought that I might need to do some proper retraining of the model with images of size (2064, 1544) and maybe different anchor sizes. (Is that true?) How should I do this? Should I train the model from scratch without COCO pretraining? Should I keep the COCO pretraining and simply do transfer learning with larger image sizes? I think training everyhing from scratch would require a huge dataset (such as COCO) anyways and it would cost me a lot of time without knowing if it would work.
So ideally, I am looking for a way to keep the COCO pretraining (for low-level features) and retrain the model (especially the RPN) to handle my 3.1MP images.
Maybe you could give me some advice since I see that you have set a different anchor size to the Mask-RCNN model! Did you need to retrain? How did you do it? Did you use the pretrained weights from COCO? (How large is your dataset and for how many epochs did you train it?)
Thanks a lot four your help!
Hi @habjoel,
Interesting to see that you have similar challenges when detecting litter. Since this issue is not strictly related to yolact, lets get in touch on Linkedin (I have sent you a connection request), then I can try to give you some pointers.
Hello!
Thanks for a this great repo :)
In the article and on the main page of the repository, there are examples of using YOLACT on images that contain a small number of objects
I tried to train YOLACT on full marked-up image that contain hundreds of objects, as in the example below
For training I use GPU RTX 2080 Ti When I start training, sometimes the following happens:
The training doesn't begin at all
Sometimes I get OpenCV error:
I assume that there is a limit on the number of placed objects in the frame that YOLACT can be trained on. Also I think there is a problem in the markup. But I want to ask, how can I solve this problem And how effective is it to use YOLACT to detect and segment many small objects in the images
Thanks in advance for answer :)