Closed mehamednews closed 1 month ago
Do you mean you want to extract the target car by a mask with a transparent channel?
Regarding the dataset, ~50k images are enough for training, in my experience, even from scratch. Perhaps training from scratch could be better (I'm not sure) -- 1). ~50k is already a large number, only if the mask labels are valid and sound; 2). images were resized to 1024x1024 in my experiments, so there might be a gap between your images (800x600).
But anyway, only experiments can tell everything. Tell me if you have any problem with it.
Thanks for the quick answer.
Yes, I checked the DIS
dataset and I noticed they have solid
masks (either 100% white or 100% black)
what I'm trying to achieve is to get gray in areas corresponding to windows where the environment peaks through (not sure how to explain it :smile:)
current result:
target result:
I'm going to create a dataset with 1024px
in width (still 4:3) and test with it.
Thanks for your explanation. I understand your need here, where the data is similar to that in the matting tasks. For example, here is a GT example below from the portrait segmentation dataset P3M-10k -- see the pixel values in the regions of hair for an easy check.
One more important thing is that though GT labels in datasets like DIS5K are in {0, 1}, the predicted maps are always float numbers in the range of (0, 1), some techniques were even proposed to push these values more confident -- to 0 or to 1, instead of to 0.5. What I want to mean is that it only depends on the datasets provided for training. And as I said before, there might be data domain gap between your custom data and DIS5K or used data. Therefore, if you have insufficient GPUs, you can fine-tune the provided general-use weights. If possible, I recommend training from scratch to examine the accuracy.
Hi, First, I'd like to thank you for your amazing work (& the amount of effort you're putting into answering questions). I'd appreciate your insights & suggestions regarding the possibility of fine-tuning one of the pre-trained models (not sure which one would be best here) on images of cars. My main goal is to remove what's behind the windows of cars. I think an example will explain this better:
I generated ~50k images (800x600) with their respective masks, would this be too much?