Closed abhishek0093 closed 2 weeks ago
I find an interesting workaround. Using custom sampler and BatchSampler in torch, I'm now able to achieve variable length batch sizes . So now I'm using batch_size of 4 for images which can be processed in 1024x1024 and for bigger images I'm using batch_size equals 1. Not sure if this is the best way, would love to hear if someone has some other way. Also would be great, if we could use multiple GPUs for processing, so that we can process bigger images also into batches.
Hi, Abhishek. Thanks for your interest and for writing so many texts about your problems. I'm always glad to help with these things, but too tired recently. Hence, I actually read your message last night but am taking time now to do it.
About the effectiveness of different sizes for training (1024x1024) and inference: I also did this kind of thing to see if I could obtain better results on benchmarks (see the keep_size
in dataset.py
). In my experiments, using the images in their original sizes for inference does not improve the results.
About dividing the large image into multiple patches to more GPUs for inference: in the existing pipeline, I think it's not possible since there are usually more than one object in the image -- salient object detection (SOD) is the first step to be conducted while cropping the image into patches will destroy the semantics for the target localization. However, in my mind, it's very possible to achieve this result with box prompt which doesn't rely on SOD in the whole image -- divide the box prompt and image simultaneously for separate inference on multiple GPUs.
If possible, I will try to train a BiRefNet with 2048x2048 training data or mixed-resolution data.
I might not answer all questions well. So, feel free to reply to me if you still have questions :)
@Zenpheng7, Thanks for replying. I do understand that you may have many commitments and I really appreciate you for taking out time and actively maintaining this repository as well as replying to people’s issues. Currently my issue is resolved as I'm able to solve batching thing with variable batch sizes/padding and for larger images simply resizing it to maximum possible in my GPU setup seems to easiest solution rn. Thankyou for help . I'm marking this issue as closed.
Okay, sorry for not replying to you in more detail. But I'm really happy to answer the questions you provide here, where things are listed very clearly.
Hi @ZhengPeng7 First of all, very thanks for this great work and open sourcing it.
Currently I’m running into an issue, and it would be very helpful if you can help me with. I have a big dataset of images that vary in sizes (some are high quality images like 4000x4000). I have noticed that for my case the results are better if I process it in
1024x1024
only if both dimensions are less than this1024
, otherwise processing in the same resolution is giving better results. So for most of the images I have to process in original dimensions which is large and varying. To implement processing in batches I have implemented my own custom dataloader that returns transformed image tensor . Everything is working fine If I’m resizing everything to1024x1024
, however I face problems if I decide not to resize. As pytorch doesn’t allow to return variable length tensors , so I’m adding extra padding to make every image in batch equivalent to largest dimension present in the given batch and I can later remove the extra added padding to get original shaped transformed image. Now there are two problems at this stage :Here is the code snippet for reference :
and later I'm calling this dataloader something like this :