Open ZhenhuiL1n opened 2 weeks ago
Also, in the paper, you mentioned that the method is tested with 10241024 resolution and 512512 with different inference speeds, Did you retrain the model under different resolutions with modified codebase, or is there a way to use the provided checkpoint to reproduce that?
By the way, is the normal detail well reconstructed in non-edge areas? I want to check if the model is fully trained.
Hi, sorry for the late reply, I was traveling and did not have access to my PC.
Here are the normal predicted and normal from depth. I have trained the first stage for 30 epochs and the second stage for 10 epoches.
Also, when I trained the first stage with background image augmentation, the model can not predict good down-scaled normal, it ended with a black image predicted as I have shown in the last issue. I managed to train the downscaled normal predictor with no background augmentation.
There might be some other fixes I can do to train the first stage with background augmentation. Do u have any advice? Thanks a lot!
If so, remove all models and losses needed for learning the img2normal_face, upper, arm, leg, and shoe models('part_normal' in self.loss_phase), and only learn "img2normal_down" and "ImgNorm_to_Dep".
After these two are completely learned, try learning img2normal_face, upper, arm, leg, and shoe.
https://github.com/SangHunHan92/2K2K/blob/18b2038fc683386855dd3021fb2cc7fcdd2a06b1/models/deep_human_models.py#L394-L398 https://github.com/SangHunHan92/2K2K/blob/18b2038fc683386855dd3021fb2cc7fcdd2a06b1/models/deep_human_models.py#L557-L561 https://github.com/SangHunHan92/2K2K/blob/18b2038fc683386855dd3021fb2cc7fcdd2a06b1/models/loss_builder.py#L73
Hi,
However, in this issue, you mentioned that the input for the model prediction model should be trained with a black background. What is the difference between the black background and the augmented background in terms of the model gain?
First, the model does not need to determine what the background is, which prevents incorrect depth predictions for the background.
In addition, even in cases where it is difficult to distinguish between the foreground and the background, the edge of the foreground is determined in advance, allowing for more accurate estimation.
Hi,
I retrained the model using the 2K2K dataset and tried and finished stage 1 (30 epochs) and stage 2 (10 epochs) with the same setting illustrated in the paper. The results are not as good as the ones provided, and I have a few questions about the retraining.
Firstly, I want to ask in the training if you used background augmentation in both phase 1 and phase 2. If I add the background augmentation in the tensorboard, the model can not predict good results. I can show the results when I add background argumentation in phase 2. The result is shown below:![image](https://github.com/SangHunHan92/2K2K/assets/56009715/bc89d383-19a5-450e-8fb8-3707088e8e17)
I trained the model without background augmentation; some of the test results are good, but some of them are very bad, did you experience this kind of artifacts before?
I found that the results for most of the captured photos are not very good; there are artifacts shown in the below picture, they are mostly locate on the edge of the human and points inside out from the image plane if we look from front.
![image](https://github.com/SangHunHan92/2K2K/assets/56009715/a7e5636b-5b3e-4abc-8778-0bc14b2107d7)
The rendering people and THuman testing results are good, and Hanni is good for some reason.