Open ZZfive opened 7 months ago
Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.
grid_wguidance.mp4 Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.
91D23ZVV6NS.mp4 Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.
In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent.
However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.
animation.mp4 Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks
Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this
Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned. grid_wguidance.mp4 Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset. 91D23ZVV6NS.mp4 Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement. In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters. animation.mp4 Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks
Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this
Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map?
Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned. grid_wguidance.mp4 Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset. 91D23ZVV6NS.mp4 Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement. In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters. animation.mp4 Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks
Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this
Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map?
grid_wguidance.mp4
My results are also particularly flashing. This is my grid..mp4
I followed the data_process process for each step, and both the background flicker and the facial distortion appeared in my generated video. At the same time, I used the transferd_result processed by data_process and the reference image provided in the source code for video generation, and the above problems also occurred. I suspect that the alignment of the video with the image may be causing the problem.
I want to know if there is any way to solve the facial distortion problem and the flicker of the background. Also, I want to ask what images are stored under the 'champ/transferd_result/visualized_imgs' path. Now what I observe is a superposition of normal_image and reference image, but I don't know what that means. Please let me know if I did something wrong that caused the visualized_imgs error.
Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned. grid_wguidance.mp4 Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset. 91D23ZVV6NS.mp4 Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement. In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters. animation.mp4 Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks
Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this
Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map?
grid_wguidance.mp4
Here is my result, you can apply some deflicker methods to your condition maps
Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned. grid_wguidance.mp4 Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset. 91D23ZVV6NS.mp4 Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement. In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters. animation.mp4 Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks
Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this
Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map? grid_wguidance.mp4
grid_wguidance_anyone.mp4 Here is my result, you can apply some deflicker methods to your condition maps
What deflicker methods can i try? Can you tell me?
The first video uses the ref-07.png and motion-02 The second video uses the ref-07.png and processed video
And the face distortion like this:
We will release a SMPL smoothing feature soon, maybe this week, to solve the flicker problem.
Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.
https://github.com/fudan-generative-vision/champ/assets/57706634/c1e0a4f4-e7df-4147-9a0e-6e751ed97399
Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.
https://github.com/fudan-generative-vision/champ/assets/57706634/40b1f05c-53df-4be1-9b73-1d00f8faca1f
Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.
In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent.
However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.
https://github.com/fudan-generative-vision/champ/assets/57706634/ff679937-b90c-4a5f-b24e-17f59ce04f37
Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks