SangHunHan92 / 2K2K

Official Code and Dataset for "High-fidelity 3D Human Digitization from Single 2K Resolution Images" (CVPR 2023 Highlight)
https://sanghunhan92.github.io/conference/2K2K/
Other
209 stars 5 forks source link

Reitrained model can not achieve very good quality #27

Open ZhenhuiL1n opened 2 weeks ago

ZhenhuiL1n commented 2 weeks ago

Hi,

I retrained the model using the 2K2K dataset and tried and finished stage 1 (30 epochs) and stage 2 (10 epochs) with the same setting illustrated in the paper. The results are not as good as the ones provided, and I have a few questions about the retraining.

  1. Firstly, I want to ask in the training if you used background augmentation in both phase 1 and phase 2. If I add the background augmentation in the tensorboard, the model can not predict good results. I can show the results when I add background argumentation in phase 2. The result is shown below: image

  2. I trained the model without background augmentation; some of the test results are good, but some of them are very bad, did you experience this kind of artifacts before?

    I found that the results for most of the captured photos are not very good; there are artifacts shown in the below picture, they are mostly locate on the edge of the human and points inside out from the image plane if we look from front. image image image

    The rendering people and THuman testing results are good, and Hanni is good for some reason.

image image image

  1. I only trained the model with no shading rendering; does training the model with shading help?
ZhenhuiL1n commented 2 weeks ago

Also, in the paper, you mentioned that the method is tested with 10241024 resolution and 512512 with different inference speeds, Did you retrain the model under different resolutions with modified codebase, or is there a way to use the provided checkpoint to reproduce that?

SangHunHan92 commented 2 weeks ago
  1. I use background augmentation to train both phases. If you changed it along the way, this causes previously trained models to fail to produce correct results.
  2. This is weird. I show some artifacts at the edge, but I've never seen a problem like this where the front/back depth is pushed forward at the same time. (I trained that the background of the front depth faces the back). https://github.com/SangHunHan92/2K2K/blob/18b2038fc683386855dd3021fb2cc7fcdd2a06b1/models/loss_builder.py#L219-L222 If the same phenomenon continues to occur, I recommend to remove the background and feed it during test time with tools like "removebg".
  3. Rendering with shading shows the wrinkles in the clothes through shadows, and the in-the-wild image also shows a similar appearance to the one with shading. So shading the human in the image is very important.
  4. I re-trained models with modified codebase(only resolution) for testing different resolutions.
SangHunHan92 commented 2 weeks ago

By the way, is the normal detail well reconstructed in non-edge areas? I want to check if the model is fully trained.

ZhenhuiL1n commented 1 week ago

Hi, sorry for the late reply, I was traveling and did not have access to my PC.

image image

image

Here are the normal predicted and normal from depth. I have trained the first stage for 30 epochs and the second stage for 10 epoches.

ZhenhuiL1n commented 1 week ago

Also, when I trained the first stage with background image augmentation, the model can not predict good down-scaled normal, it ended with a black image predicted as I have shown in the last issue. I managed to train the downscaled normal predictor with no background augmentation.

There might be some other fixes I can do to train the first stage with background augmentation. Do u have any advice? Thanks a lot!

SangHunHan92 commented 1 week ago

If so, remove all models and losses needed for learning the img2normal_face, upper, arm, leg, and shoe models('part_normal' in self.loss_phase), and only learn "img2normal_down" and "ImgNorm_to_Dep".

After these two are completely learned, try learning img2normal_face, upper, arm, leg, and shoe.

https://github.com/SangHunHan92/2K2K/blob/18b2038fc683386855dd3021fb2cc7fcdd2a06b1/models/deep_human_models.py#L394-L398 https://github.com/SangHunHan92/2K2K/blob/18b2038fc683386855dd3021fb2cc7fcdd2a06b1/models/deep_human_models.py#L557-L561 https://github.com/SangHunHan92/2K2K/blob/18b2038fc683386855dd3021fb2cc7fcdd2a06b1/models/loss_builder.py#L73

ZhenhuiL1n commented 1 week ago

Hi,

However, in this issue, you mentioned that the input for the model prediction model should be trained with a black background. What is the difference between the black background and the augmented background in terms of the model gain?

https://github.com/SangHunHan92/2K2K/issues/11

SangHunHan92 commented 1 week ago

First, the model does not need to determine what the background is, which prevents incorrect depth predictions for the background.

In addition, even in cases where it is difficult to distinguish between the foreground and the background, the edge of the foreground is determined in advance, allowing for more accurate estimation.