ZhengPeng7 / BiRefNet

[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
https://www.birefnet.top
MIT License
1.09k stars 84 forks source link

After the two commits Yesterday the mask the accuracy has dipped #4

Closed rishabh063 closed 6 months ago

rishabh063 commented 6 months ago

image

This was working fine till yesterday but some how with the new changes it doesn't work

I also did a hard reset but it asking for such file or directory: '/root/autodl-tmp/weights/swin_large_patch4_window12_384_22kto1k.pth'

i am downloading and trying it out but idk what happened.Looked at commits cannot see anything major

rishabh063 commented 6 months ago

image after reverting to commit 76ef1cdd31f1ede51664b0ec7d24a942c71f46d3

and downloading swin_large_patch4_window12_384_22kto1k.pth

and putting the swin in /root/autodl-tmp/weights

it works as perfectly .

Can you explain what happened here?

ZhengPeng7 commented 6 months ago

Thanks a lot for pointing it out! I was trying to remove some unnecessary steps to make the inference and evaluation more convenient yesterday. For example, loading the swin_large backbone weights should not be mandatory if you want to test a given well-trained checkpoint. I've checked the inference, it did cause the difference and make the results worse. I'll find the problem and correct it this afternoon (before 3 p.m., UTC-8). You can wait for my reply here once it's done. And thanks a lot for keeping eyes on this proj and finding my stupids! :)

ZhengPeng7 commented 6 months ago

Hi, Rishabh, you can check it again. I just fixed the bug and another bug -- inconsistent usage of resize in cv2 and PIL when load_all is set different for training and inference.

ZhengPeng7 commented 6 months ago

Besides, plz use the new well-trained weights BiRefNet_ep580.pth in the stuff folder.

rishabh063 commented 6 months ago

Sure i will start using the new one . Can you quickly fill me in on so why was this happening ?

ZhengPeng7 commented 6 months ago

Certainly. I tried to remove some redundant operations and simplify the inference procedure, e.g., generating the Laplacian maps. But I removed some necessary things and triggered a bug. Now the version is correct, and the results are 100% same as the previous ones as I checked. So, feel free to use it with faster inference (demo on huggingface becomes 10x faster).

rishabh063 commented 6 months ago

Oh nice . I will look into it .

Also hows the new model doing ? Is there a significant improvement ?

ZhengPeng7 commented 6 months ago

No improvement in the predicted results -- they are exactly the same as before with the same model. Just remove some processes only for training. If you do the inference on powerful devices with GPUs, it would not be much faster. But if it runs on bad devices like the free space of huggingface, it's much faster.

rishabh063 commented 6 months ago

I am asking about the new model epoch 580 compared to previous 400+epoch

ZhengPeng7 commented 6 months ago

Not very significant, but also not small. And model of epoch 580 was trained with 20 IoU fine-tuning which I set at the last epochs, as stated in the paper. This fine-tuning with only IoU for some epochs improve a lot metrics like HCE, wF, while decrease a little bit on structural metrics like Sm, xF. Let me attach the specific results here:

+---------+-----------+-------+-----------+------+----------+--------+------+-------+--------+-------+-------+ & Dataset & Method & maxFm & wFmeasure & MAE & Smeasure & meanEm & HCE & maxEm & meanFm & adpEm & adpFm & +---------+-----------+-------+-----------+------+----------+--------+------+-------+--------+-------+-------+ & DIS-TE1 & tm--ep580 & .865 & .826 & .036 & .888 & .912 & 112 & .919 & .852 & .902 & .825 & & DIS-TE2 & tm--ep580 & .899 & .867 & .032 & .907 & .936 & 281 & .943 & .887 & .925 & .866 & & DIS-TE3 & tm--ep580 & .921 & .889 & .030 & .918 & .950 & 606 & .960 & .907 & .949 & .897 & & DIS-TE4 & tm--ep580 & .900 & .860 & .041 & .898 & .937 & 2864 & .951 & .880 & .940 & .873 & & DIS-VD & tm--ep580 & .891 & .855 & .037 & .899 & .930 & 1061 & .940 & .877 & .930 & .863 & +---------+-----------+-------+-----------+------+----------+--------+------+-------+--------+-------+-------+

+---------+-----------+-------+-----------+------+----------+--------+------+-------+--------+-------+-------+ & Dataset & Method & maxFm & wFmeasure & MAE & Smeasure & meanEm & HCE & maxEm & meanFm & adpEm & adpFm & +---------+-----------+-------+-----------+------+----------+--------+------+-------+--------+-------+-------+ & DIS-TE1 & tm--ep480 & .851 & .811 & .039 & .884 & .906 & 116 & .915 & .835 & .896 & .804 & & DIS-TE2 & tm--ep480 & .894 & .858 & .034 & .907 & .932 & 294 & .941 & .878 & .922 & .850 & & DIS-TE3 & tm--ep480 & .916 & .879 & .032 & .918 & .947 & 652 & .959 & .896 & .945 & .881 & & DIS-TE4 & tm--ep480 & .900 & .852 & .042 & .903 & .932 & 3053 & .954 & .872 & .937 & .859 & & DIS-VD & tm--ep480 & .884 & .843 & .041 & .900 & .923 & 1135 & .937 & .864 & .920 & .843 & +---------+-----------+-------+-----------+------+----------+--------+------+-------+--------+-------+-------+

Also, the performance of all ckpts I evaluated are saved in the performances_all_ckpts folder in stuff.

rishabh063 commented 6 months ago

thanks. Also in your paper i see no comparison with https://github.com/plemeri/InSPyReNet . It was the SOTA before you guys beat them . is there a reason for that like similar architecture to any other technique ?

ZhengPeng7 commented 6 months ago

Yeah, their work is very nice! However, their experiments on DIS were not in their official version in ACCV. Besides, recent papers (RMFormer, HQ-SAM, ...) in the same areas also didn't compare with their results or even cite it. And to be honest, their performance was too high to decrease the improvement ratio of our results shown in the paper (To be frank, it's not very good). Therefore, we cited their work but did not compare their results in tables. I recommended their work, of which the results can be reproduced with their codes. But their codes were too kind of engineering, so I didn't use any of them. But if you have stronger coding ability, you can have a try.

rishabh063 commented 6 months ago

You mean their output is good and your improvement ratio would have impacted if you compared with them ?

And i was able to replicate thier results but the were not very useful in real life scenarios

ZhengPeng7 commented 6 months ago

Yeah, based on the reasons I mentioned above. But unfortunately, 2 of 3 reviewers also mentioned this 😂... So I suggest not doing this anymore unless the performance cannot exceed theirs.

You mean that models you re-trained on their codes can achieve the performance in benchmarks, but they are not good in real-world images? If so, there might be overfitness to the dataset. I guess larger models or using more data can sort of alleviate this problem.

rishabh063 commented 6 months ago

I am little confused here . Your approach is better and give a higher benchmark right ?

ZhengPeng7 commented 6 months ago

Sure. I'm talking about this work. I mean in most cases, people may dismiss one or two methods that they cannot beat in submission.

rishabh063 commented 6 months ago

sorry to ask this again .

You guys have a better result out of the two ?

I am trying to make an open source alternative to remove.bg for the community which one of the methods should i consider

ZhengPeng7 commented 6 months ago

On the existing benchmarks, BiRefNet is better. However, I highly recommend you collect some example images and assess the results manually.

rishabh063 commented 6 months ago

closing this issue , Became our chatting app lol

mikebilly commented 5 months ago

@rishabh063 I've been using transparent-background (inspyrenet) for a long time now, and after trying some tests images with birefnet to compare with inspyrenet, inspyrenet is clearly better and more accurate

rishabh063 commented 5 months ago

Can you share some examples here where inspyrenet worked better ?

ZhengPeng7 commented 5 months ago

@mikebilly Hi Hoàng Đức Mạnh, actually I saw you on twitter said InSPyReNet is better and here again😂, as this link and the screenshot attached below. I support your right to say any method is better. I don't know why you again come to this closed issue and say this again... Plz calm down.

截屏2024-04-13 18 55 51

To be frank, InSPyReNet is really good work, and I've also read their paper many times. However, the trained models differ in many factors. For example, the model weights I provided were trained on DIS5k-TR, while the model you use from InSPyReNet might be trained on some other datasets. Besides, the images you tested might be insufficient samples, for which we make a fair comparison on existing open benchmarks. Finally, I'm also glad to see better results from InSPyReNet to help the community. That's also what I've done here. So, I'm also glad to see more people both use InSPyReNet and our BiRefNet. I really don't understand why...

rishabh063 commented 5 months ago

@ZhengPeng7 insyprenet general use is trained on a very big dataset , DIS , UHRSD and 5 7 more . For true comparison some one has to do it for this too . You can request for huggingface grant and do that maybe

ZhengPeng7 commented 5 months ago

Thanks for reminding me, Rishabh. I'll try to do that. Some companies have contacted me to help with better training on more A100s. Many thanks for it :) BTW, now that I have the GPU resource, I'll also try to train a better model for the edge device in my free time, which was mentioned by you and many others. Good luck.

plemeri commented 5 months ago

Hello all, this is a closed issue, and I really don't mess around, but I would like to leave a comment for this issue. I'm the first author of InSPyReNet, and I'm a maintainer of transparent-background as well.

To all of those users who appreciate our work, I cannot tell how grateful I am. However, comparing the performance of different works is very difficult to design and sometimes, a truly fair comparison never exist. To be honest, I also do not agree that our paper has done a 100% fair comparison, because some works do not release their code or results, and the metrics aren't perfect to measure the true performance. They are just for the reference only.

Also, comparing our transparent-background package with this repository is 100% nonsense. We released our package with massive composite dataset including testing benchmark dataset which would normally be excluded for training. Thus, we separated our package from our work InSPyReNet in order to avoid such confusion. If the author of BiRefNet does the same thing, training with massive composite dataset, they might exceed our performance, and I mean it.

Lastly, let's please appreciate their awesome work, and the fact that they released their source code publicly needs more respect. Their work ain't fake, nor unfair comparison. I think they have done wonderful work and their comparison is fair enough to say that they have achieved SOTA performance. I always love to see other researchers trying to achieve better performance and finally achieve so.

Thank you @ZhengPeng7 for your amazing work again, and thank you again @rishabh063 and @mikebilly for enjoying our work.

ZhengPeng7 commented 5 months ago

Hi, Taehun @plemeri, really appreciate you for writing so much! I have also read and run your codes before, which are easy to run and reproduce with high performance. And the transparent-background package is also amazing (In my personal view, the best one right now). I hope more people can join in building a better community with more open-source projects :)

rishabh063 commented 5 months ago

Guys please collaborate , great potential