How to generate video stylisation results?

AdarshMJ commented 5 years ago

Is there a recommended way to perform video stylisation? For now, Im converting the original video file and style video file into frames, performing photorealistic style transfer and then making a video out of the output frames. However there seems to be problem while encoding the frames to video leading to blurry video output. Any suggestions are welcome!

jaejun-yoo commented 5 years ago

@AdarshMJ Thanks for the comment. I am a little confused on your last comment. As far as I understand, if each frame image is clear then the encoded video output must be clear as well. Are you saying that each stylized frame is clear before the encoding but the output comes out blurry? If so, there is one thing that is different from your approach with ours, which I suspect as the source of blurriness. For video stylization, we used a single style image (sunset) for stylizing the entire content video frames. This may be the reason for the blurred outputs that come from different stylization of nearby frames. Is it necessary to stylize your content video frame with each corresponding style video frame? If not, please try with a single representative style frame and let me know if this trick works or not.

AdarshMJ commented 5 years ago

Hello! The transfer works perfectly. The problem was with converting frames to video. When I convert the stylised frames to a video the video appears with frames which are blurred. I will try with the method you suggested. Thank you!

Also I wanted to know if you are planning to release the training code as well?

Adarsh Jamadandi

On 08-Apr-2019, at 9:58 AM, Jaejun Yoo notifications@github.com wrote:

@AdarshMJ Thanks for the comment. I am a little confused on your last comment. As far as I understand, if each frame image is clear then the encoded video output must be clear as well. Are you saying that each stylized frame is clear before the encoding but the output comes out blurry? If so, there is one thing that is different from your approach with ours, which I suspect as the source of blurriness. For video stylization, we used a single style image (sunset) for stylizing the entire content video frames. This may be the reason for the blurred outputs that come from different stylization of nearby frames. Is it necessary to stylize your content video frame with each corresponding style video frame? If not, please try with a single representative style frame and let me know if this trick works or not.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jaejun-yoo commented 5 years ago

@AdarshMJ Hi. I hope that works. About the training, you know, making a neat release-version code needs quite a time and efforts 😅. I am thinking of releasing the training code for enc-dec networks in a near future but I cannot guarantee that the release will come soon.

AdarshMJ commented 5 years ago

@JaejunYoo Yes I completely understand the effort it takes to release a code😅. Also I had one more query.. would the segmentation maps make much difference for stylisation? I tried stylising with the segment maps from the code released by PhotoWCT, the results were bad.

Adarsh Jamadandi

On 08-Apr-2019, at 12:41 PM, Jaejun Yoo notifications@github.com wrote:

@AdarshMJ Hi. I hope that works. About the training, you know, making a neat release-version code needs quite a time and efforts 😅. I am thinking of releasing the training code for enc-dec networks in a near future but I cannot guarantee that the release will come soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jaejun-yoo commented 5 years ago

@AdarshMJ Yes, it does and its amount depends on the content and style. Generally, including a segmentation map would provide way much better stylization results. This can be easily seen when you try a nightscape of a city as a style. If you do not have a segmentation map, the stylized output would be very dark overall (because your model cannot map sky to sky and building to building). Dependency on the semantic map is inevitable for photorealistic stylization until now. Still, I would say ours are better in maintaining the structure.

AdarshMJ commented 5 years ago

Im using the segmentation code from this [CSAILVision/semantic-segmentation-pytorch ](git clone https://github.com/mingyuliutw/semantic-segmentation-pytorch segmentation). I was able to generate this segmentation mask from that code. in02

Should I give this whole image as content-segment input? or Just crop out the segmentation map and give just that?

When I crop the segmentation mask and give it as input, I get this error -

Traceback (most recent call last): File "transfer.py", line 205, in run_bulk(config) File "transfer.py", line 155, in run_bulk content_segment = load_segment(_content_segment, config.image_size) File "/Users/adarsh/Desktop/project/WCT2/utils/io.py", line 88, in load_segment image = change_seg(image) File "/Users/adarsh/Desktop/project/WCT2/utils/io.py", line 64, in change_seg dist = np.sum(np.abs(np.asarray(key) - arr_seg[x, y, :])) ValueError: operands could not be broadcast together with shapes (3,) (4,)

jaejun-yoo commented 5 years ago

@AdarshMJ First of all, please check the tutorial given by PhotoWCT authors: https://github.com/NVIDIA/FastPhotoStyle/blob/master/TUTORIAL.md#prepare-label-maps By doing so, you would be able to follow the exact same procedure of which trained dataset is made.

Secondly, please check how the example images I gave looks like. They should share the same filename as well as image size. For semantic maps, a unique label set of your style image should match with that of your content image. I guess this is the reason why you are having such error.

I hope this helps to clarify your issue.

AdarshMJ commented 5 years ago

@jaejun-yoo Thank you for the clarifications! The problem was the resolution of the segmentation maps and its corresponding content/style maps do not have same resolution. I guess that's why the error. I will retry it by having similar image sizes. Thank you!

jaejun-yoo commented 5 years ago

Good. Since the problem asked is clarified and this thread is already too long, I will close this issue. Please open another one if you need to ask about the semantic label map. Thx!

clovaai / WCT2

How to generate video stylisation results? #4