ZhengPeng7 / BiRefNet

[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
https://www.birefnet.top
MIT License
1.09k stars 84 forks source link

Remove background for comic drawings #17

Closed ynie closed 5 months ago

ynie commented 5 months ago

Hello, thank you for the great model. The model works great most of the time, but it seems to have trouble with some "easy images" like the one below. The it and pants are getting erased. Is that a known issue? Thank you! ComfyUI_temp_hslot_00001_ Output_temp_kiyby_00001_

ynie commented 5 months ago

Here are the weights im using:

BiRefNet-ep480.pth

From ComfyUI: https://github.com/viperyl/ComfyUI-BiRefNet

But it seems like there's a new version. should I always upgrade? Thanks!

ZhengPeng7 commented 5 months ago

The weights there were copied from my google-drive. You can download the latest weights from my google-drive as links given in my README (it should be DIS-ep580). It may be better.

But this problem still exists as I tested this image in my online demo. In my mind, I tried to get a high recall with relatively lower resolution input, and it worked for some improvement (it's still not perfect, but it is indeed an idea of ensembling to alleviate this problem). I've attached the results of your image in both 1024x1024 and 768x768. result_1024x1024

result_768x768

Also glad to hear more from you about it.

ynie commented 5 months ago

Thank you for the response. Is the idea that I should always resize to 768 X 768 before sending it to BiRefNet? If so, is there anything like a constant I can tune in the code without resizing?

ZhengPeng7 commented 5 months ago

I mean ensemble the result of 768x768 (R1) and that of 1024x1024 (R2) if you want a higher recall rate -- use R1 ∪ R2 as the final result.

ynie commented 5 months ago

Sorry I'm still confused on how to do this via code. However, I have another question, what makes it difficult to remove the background from this image? The background is pretty green-ish. I can try to modify the output image and make it work better for BiRefNet. Any pointers? Thanks!

ZhengPeng7 commented 5 months ago

Hi, Ynie,

I guess the high contrast between the bright person and his clothes and the low contrast between the pants and background jointly lead to the unsatisfying result. Of course, models may learn better on this (but this is not guaranteed). For example, using my latest weights you can perfectly extract the tie (also low contrast with the background) which was not extracted well in the result you provided:

截屏2024-04-19 12 07 09
ynie commented 5 months ago

Got it. Thank you!