Closed Inferencer closed 1 year ago
TBH I think the current resolution is good enough considering the normal size of the faces in the videos. However, this is not enough to always get nice results and fine-tuning or even retraining might be necessary. Sometimes if the identity in the input video is far from the identities distribution in the HDTF dataset the results will be too off, and hence including such new identity in the dataset could be necessary to get better results. But this is not always practical since the training would take too much time, and I believe a lighter person specific stage could be better. I am working on something at the moment to get more reliable results but still at early stages.
To be clear I do mean the cropped faces created for training purposes. I am also playing with dataset sizes as I'm doing person specific 3 hours was enough now I'm going to cut it in half but the crop res for me is terrible, and now I'm looking at it it's not even full face as 1/4 of the right cheek is missing, DINet did say training to a higher res is possible but without a decent cropped face res it wouldn't be worth it
Interesting. Actually I didn't try increasing the resolution and I was even thinking of using the lower resolution, 128, and train a person specific model to fix any issues with the inpainting step. So the goal, in the best case scenario, is to significantly speed up the inference further, beside using this specific model to fix all glitches and blurriness in the final output.
Of course that makes sense, obviously you have a different goal to me just glad we have some more approaches since the original author went afk. before I close this is there any chance of a windows branch? I train on Collab with a Linux build and want to try your deepspeech replacement but for inference I use windows,
no matter if not as I can probably figure it out from your current code.
I can have a look into that soon and keep you posted.
And also if you could manage to run it on windows, please feel free to open a PR
Glad to see some more activity regarding DINet, as a few of us are doing person specific training I wonder what you think about the current quality of the cropped faces used for training and if there resolution could be increased?