Elsaam2y / DINet_optimized

An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
93 stars 15 forks source link

Cropped faces #3

Closed Inferencer closed 10 months ago

Inferencer commented 10 months ago

Glad to see some more activity regarding DINet, as a few of us are doing person specific training I wonder what you think about the current quality of the cropped faces used for training and if there resolution could be increased?

Elsaam2y commented 10 months ago

TBH I think the current resolution is good enough considering the normal size of the faces in the videos. However, this is not enough to always get nice results and fine-tuning or even retraining might be necessary. Sometimes if the identity in the input video is far from the identities distribution in the HDTF dataset the results will be too off, and hence including such new identity in the dataset could be necessary to get better results. But this is not always practical since the training would take too much time, and I believe a lighter person specific stage could be better. I am working on something at the moment to get more reliable results but still at early stages.

Inferencer commented 10 months ago

To be clear I do mean the cropped faces created for training purposes. I am also playing with dataset sizes as I'm doing person specific 3 hours was enough now I'm going to cut it in half but the crop res for me is terrible, and now I'm looking at it it's not even full face as 1/4 of the right cheek is missing, DINet did say training to a higher res is possible but without a decent cropped face res it wouldn't be worth it 0000001

Elsaam2y commented 10 months ago

Interesting. Actually I didn't try increasing the resolution and I was even thinking of using the lower resolution, 128, and train a person specific model to fix any issues with the inpainting step. So the goal, in the best case scenario, is to significantly speed up the inference further, beside using this specific model to fix all glitches and blurriness in the final output.

Inferencer commented 10 months ago

Of course that makes sense, obviously you have a different goal to me just glad we have some more approaches since the original author went afk. before I close this is there any chance of a windows branch? I train on Collab with a Linux build and want to try your deepspeech replacement but for inference I use windows,

no matter if not as I can probably figure it out from your current code.

Elsaam2y commented 10 months ago

I can have a look into that soon and keep you posted.

Elsaam2y commented 10 months ago

And also if you could manage to run it on windows, please feel free to open a PR