NVIDIA / vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Other
8.6k stars 1.2k forks source link

Understanding the flow for training faces #24

Closed fxfactorial closed 6 years ago

fxfactorial commented 6 years ago

Please correct me if I am wrong. ( I am focusing just on faces)

As I understand, vid2vid lets you provide a video from which each frame is like labeled data for training. So once one has a trained model, then given any input data of just edge-maps, then vid2vid will try to create a face (based on the trained data) from the edge maps.

I am not clear though how to do this with train.py. Do I need to generate edge-maps myself for each frame of my video?

Ideally I want to just provide vid2vid a single say .avi or video file and vid2vid generate edge-maps itself for each frame, outputs a trained model.

Thank you @tcwang0509 @junyanz

When answering, please include CLI commands that I can copy paste/directions that I can immediately do/changes to Python code that might be needed.

nrgsy commented 6 years ago

Even just getting the code that was used to detect/connect the face landmarks and generate background edges for the edge map sketch for a single video frame would be great. dlib (http://dlib.net/ml.html) was cited in the paper, but it would be nice to have the code that merges the connected facial landmark sketch with the background edges from the canny edge detector.

dustinfreeman commented 6 years ago

I do wonder: is the training input just the binarized facial features and edge map? Or are the different facial features mapped to different channels? Even the paper is pretty unclear about it.

nrgsy commented 6 years ago

I believe the network input is just the raw black & white edge image and the output is the photorealistic face image matching those edges.

fxfactorial commented 6 years ago

Would be great to get confirmation (& copy pastable commands/python code) from the OPs @tcwang0509 @junyanz or anyone else please.

fxfactorial commented 6 years ago

Actually digging into the source code, this repo is not complete. It is missing face_dataset.py

As of bdb7ec6b60cd0d5f1c122aeebd4283e478d2d664, this will fail on line 11 of custom_dataset_data_loader.py

        from data.face_dataset import FaceDataset

I'm wondering why this code was not included as it is quite valuable and (at least to me and some other folks) more interesting than the segmentation map, change landscape based code/ideas.

Opened issue: https://github.com/NVIDIA/vid2vid/issues/26

tcwang0509 commented 6 years ago

Yes we'll update the face dataset part shortly. As pointed out, we run landmark (using dlib library) and edge detection on the face images, and use the B&W image as input. I'm currently in an urgent deadline, and probably won't be able to update it until next week or the week after though...

fxfactorial commented 6 years ago

@tcwang0509 Appreciate your answer & time. Could you please outline what would be needed to do to at least stub it out/reimplement it (as detailed as possible) ? Or is it too complicated and better for you to just dump the code? Very much want to have this working by this coming Monday/Tuesday as training also takes time.

tcwang0509 commented 6 years ago

If you're willing to use the un-refactored code, I can provide it here... haven't tested whether it works in the latest version though, but the idea should be the same. face_dataset.zip

fxfactorial commented 6 years ago

@tcwang0509 Love it, thank you very much, that's all I need right now, thanks!