ZHKKKe / MODNet

A Trimap-Free Portrait Matting Solution in Real Time [AAAI 2022]
Apache License 2.0
3.83k stars 637 forks source link

Higher quality like THIS AMAZING demo? ๐Ÿ˜ฎ + Comparison Examples #193

Closed AlonDan closed 2 years ago

AlonDan commented 2 years ago

Dear @ZHKKKe I hope you can help me and others to come,

I'm pretty new to this (also I'm not a programmer) but I'm very fascinated about your great research and I'm trying to get higher quality, more accurate results like in your personal website's DEMO.

I tried the Collab version and the Local version as I'm using Windows 10 + Anaconda.

At first I wasn't sure but then I noticed that there is a HUGE different with the current pretrained model compare to YOUR website demo.

So I'm not sure if there is a way to download and experiment with the same pretrained model you use on your website demo, that will be very interesting to try and do comparisons on local machine.

I don't really know how to TRAIN my own dataset, I tried to look but it is not very clear to me, I wish it was simple as running the demo on anaconda. (maybe it is not hard, but I didn't find a step-by-step guide or tutorial which would be VERY helpful)


Here are just couple of examples between the ModNet pretrained model and your online website which is much more accurate:

Modnet (collab) - 1 ModNet - 1

ZHKE (personal website) - 1 Zhke - 1

Modnet (collab) - 2 ModNet - 2

ZHKE (personal website) - 2 Zhke - 2

I'm enjoying doing these comparisons, I must mention so far... Your @ZHKKKe website demo are much more accurate and cleaner, not in all areas of course probably I would love to keep training it even more to make it more accurate!

Some interesting numbers I've tested, roughly: 16 out of 20 (portrait human) images where very accurate (ZHKKKe's demo website) compare to a messy results on the pretrained model (about 25 MB) used in Local and Collab ModNet. 4 images pretty much failed on hair/background and added some not-needed objects in the background.

I'm now very curious about the pretrained model you're using on your demo website and also about training my own dataset (if I'll understand how to do it of course...)


Please consider to share the pretrained model you used on your website demo. and if there is a guide or tutorial explain how we can Train our own dataset, it will be VERY helpful as well.

I would like to experiment and help in your research by showing comparisons or improve training if I'll know how to do it on my local machine.

I hope you can help in this, thanks ahead and please keep up the wonderful work, YOU ROCKS! ๐Ÿ˜Ž

ZHKKKe commented 2 years ago

Hi, thanks for your interest in our works. The model used in our demo:

  1. have almost the same architecture as the model released in this repo
  2. but is trained on a larger dataset (including more than 10K annotated data).

The model used in the web demo is not open-source currently. If you want to improve the performance, the first step might be to collect more annotated data.

AlonDan commented 2 years ago

Thank you for the kind reply, I truly appreciate your work!

The model used in our demo:

have almost the same architecture as the model released in this repo
but is trained on a larger dataset (including more than 10K annotated data).

I actually mention it in the Harmonizer post, same direction I would LOVE if you can either go public with the DEMO trained 10K data to try locally, or even if you can just share it with me for sake of experiment, comparisons on different sources and cases, I would love to also try it on Videos (PNG Sequence) it will be interesting to experiment on this high quality trained 10K annotated data.

I wouldn't mind training my own model with but as I mention I have no idea how to do it, also I'm not a programmer. training using the demo pre-trained was friendly enough for me to understand and run under Windows and Anaconda but... training is probably much more complex thing to accomplish (for newbies such as myself) especially while there is no step-by-step guide or more visual Video Tutorial to follow.

Still, I would love to experiment and show comparisons it's so interesting and maybe even could help others in the community and mostly to you while you're working so hard on the project some user-indication and visual comparisons would give you some ideas how to improve as you keep working on MODNet.

The whole idea of MODNet + Harmonizer + Enhancer could be a very nice idea! I would defiantly love to help in the design if you'll even want to combine them as I'm a professional animator for over 2 decades I know a thing or two as an actual user and as a the typical user as well on image editing / post-production and video editing. I believe I can be helpful.

Please feel free to contact me in private if you'll consider to share the same model you used on your online DEMO since it's defiantly much better compare to the pre-trained model in so many ways which make sense... 10K annotated data sounds GREAT!

Thanks ahead and please keep up the good work! โค

AlonDan commented 2 years ago

@flagshipbowtie Hmm... so that's probably the reason for the HUGE difference between the online demo quality which is AMAZING! compare to the pre-trained one, but hopefully the developers will share it?

Also, I tried others but.. it seems like the online demo (not Google Collab) are always using MUCH HIGHER TRAINED MODELS than any pre-train, and I never get decent results...

I had hopes to get better results with RVM (Robust Video Matting) and U-2-NET and others... but none pre-trained models gave decent results (unlike their AMAZING VIDEOS) which probably using high-trained models and not the pre-trained they share...


Is there another project you recommend that gives GOOD results on Local Machine (none online colab or demo) I'm using Windows + Anaconda.

I don't mind to TRAIN MY OWN MODELS for whatever good project it is, my hope is on MODNet and I just created a detailed post about it but... if there is another project you can recommend that will also not super complicated (for newbies like me) to be able to TRAIN my own models PLEASE share!

Good example for EASY to train is DeepFaceLab but it's for deepfake faces, I'm looking for the Background Removal from whatever TARGET as a goal: Humans Full Body, Portrait, Animal, Specific Object, etc.. that's why I would LOVE to understand how to train, but I couldn't find any easy-to-follow guide in these projects yet ๐Ÿ˜ข

Thanks ahead :)

AlonDan commented 2 years ago

Thanks for sharing your information @flagshipbowtie I apprecaite it!

I've been messing with the same tools you've mentioned with so many variations, and post tools and what in order to get the most accurate / cleaner result per image for a long time.

And the most impressive results are usually either commercial online services who trained a VERY good model and that's why their results are usually very accurate (I can share some examples if you like me to share, but you probably know some of them for both Videos and Images).

I've never heard of "MODNet V" sounds like it was the next evolution of the current old MODNet and probably came with better pre-trained model. too bad it died...

So what you're saying is, there is no hope for MODNet? it's going to stay behind the others?

I had a good feeling about it because of the Dev's personal online demo version which he mention trained on 10K data (you can test if for yourself it is VERY GOOD and Accurate) compare to the pre-trained model recommended when installing MODNet from the main gihub page.


Hopefully, 2 things can happen but it's up to the developers to decide:

1 - Release the 10K pre-trained used on the private website so we can play with it

2 - Explain how to Train our own specific / target models with a simple easy to follow guide or tutorial.

I really hope the developers will consider these, as MODNet could be really good if they will allow the community to push it to the extreme with more tests, experiments and comparisons.

ZHKKKe commented 2 years ago

Hi, @flagshipbowtie I just answer some of your questions: I do use Modnet web demo and that's good. I'm not gonna say otherwise. But the code here is shit. The code in this repo is just for research purpose. It is not for commercial purpose. The released performance is consistent with the results we reports in the paper. So it is of course that the model performance in this repository is not as good as the demo on the web page. I have never promised that the model used in the web demo will be released.

Then there was this MODnet V which did produce the better result but the page was removed and the code was supposed to be added to this project but it never happened. The MODNet-V project has been terminated, so I can't get support for releasing source code and models. That is why I delete the repo.

That's pretty much why modnet sucks and doesn't deliver on the demo gifs quality at all. If you read the paper, you will know that the results in the GIF is generated by the refinement on a specific sample via the SOC tech rather than using the released model directly.

Last, when you use an open source project, please pay attention to the license of the project. For example, RVM is under GNU General Public License v3.0, and if you use it in your project/product, you need to open source your code under the same license, unless you choose to ignore the rules of the open source community.

AlonDan commented 2 years ago

First of all @flagshipbowtie you need to take a long breath an chill... As I mentioned, my English isn't very well but you probably understood most of my text.

Reading your latest replies makes me understand you are clueless in video in general, your expectations are WAAAY unreal than what lithely goes on today. C'mon... you're doing MEMES on YouTube ๐Ÿคฃ

When you mentioned that Runway is too slow, or not great? that just showed me you have NO CLUE how to even work with it consider it's simple as drag-n-drop, either you're working with sh*ty internet connection or your source materials are low quality because this project is INSANE and I worked with it for a while, next to others. I will agree that it is NOT perfept, but the way you're trashing it... you just have no experience at all, you should go back to your cute "TOPAZ LAB" tutorials or whatever toy you're using and thinking it's amazing, it's not even the same level at the moment.

I assume you're a murican or some other clueless westerner hahaha Cut out your sucking up and that annoying non stop western courtesy. That's not gonna get you what you want for free especially from a Chinese Hongkonger.

In my world it's called: being disrespectful and racist at the same time. Listen Kiddo, you're obviously not smart enough to know that, let me guess... in real life you don't have many friends, or clients, or anything related to humans and probably it makes you very bitter and racist and angry about... some projects that won't GIVE YOU THINGS for free, good luck with that attitude. ๐Ÿ‘

Unlike you, I'm a professional animator with over 2 decades and have experience in the post-production industry, I didn't play with "toys" while you're playing around with your cute cut-paste video editing "skills" but I rather shut-up and instead of complaining and spreading hate for the developers and their projects. As I mentioned before I would love to experimental in AI and help the project with comparisons and data that I can share with the community (it's the opposite of selfishness, just in case you can't recognize) and that's why I would like to learn more about TRAINING more than just "using github pre-trained models" like you're doing which is fine, but honestly you saw the results... their Meh... if not even less impressive.

Another thing, you're obviously have a lot to learn about the all projects you're messing with because in most cases the same github projects just TRAIN on BIGGER + BETTER dataset and use a MUCH MORE accurate models at the end, they never promised to release them (if you'll ever train a model, even on collab, you'll have an idea how much time it takes) you may learn it once you'll also know how to train ANYTHING on neural network.

To be honest, I can't really stand your attitude and people like yourself who are racist and rude young kids (and if you're a grown up that will be a true shame, If I was you I would disappear of shame) luckily I'm not :)

So I'm pretty much done discussing you, and focus on talking with THE AMAZING DEVELOPERS โค @ZHKKKe for doing such awesome job that I respect and encourage to make stronger and better while you (@flagshipbowtie) keep complaining on some racist sh*t as a stupid kiddo who have no clue about real production values for sure.

From this point, I rather keep my conversation with @ZHKKKe and other members of the community as I'm curious about training and the future of MODNet.

@ZHKKKe I would like to apologies for the runes and disrespectful of this youngster (or just not very bright grown up person) and hope that you understand that most people here are not "SUCK-UP" but encourage you to keep up the good work! I'm NOT expecting for anything for free or not, I respect anything I can learn from or experiment and donate my own spare time to the community and if possible to the project.

Again, being a NICE person, doesn't mean = "Suck-Up" unless you're a 9 years old probably and can't tell the differences, so give me a break.

I'm now going to read the other replies I missed I just had to respond to this out of respect. Sorry about my bad English, MUCH LOVE YO! ๐Ÿ’™

ZHKKKe commented 2 years ago

@flagshipbowtie In our test, the online demo better than the released RVM in image matting, but worse than RVM in video matting. So RVM may be better for you.