gengshan-y / viser

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.
https://viser-shape.github.io/
Apache License 2.0
73 stars 6 forks source link

About symmetric option #7

Closed phamtrongthang123 closed 2 years ago

phamtrongthang123 commented 2 years ago

Hi, thanks for publishing this wonderful works.

I want to ask is there any missing about symmetric code. I checked the code but don't see the initialize for symmetric like in LASR source code. Also the results are different from what I expected.

image7 image6

gengshan-y commented 2 years ago

Hi, we didn't assume a symmetric rest shape in ViSER. The results look strange. Was any hyper-parameter modified?

phamtrongthang123 commented 2 years ago

Thanks for the quick reply!

No I didn't modify any hyperparamet in breakdance-flare sample. I follow the default instructions (prepare, run vcn, then run the two script breakdance.sh and render result). Though I use the given script in the repo but should I increase the number of epochs?

For the camel example, I wrote the configs file follow breakdance config example and modified the breakdance script into camel (i commented the index start and end in the script). I also followed the standard instruction (prepare, run optimize and render). I thought it missing other half was because of the symmetric option because I see LASR using it. But maybe it's not the case so I ask here just in case. camel.sh

#!/bin/bash
eval "$(command conda 'shell.bash' 'hook' 2> /dev/null)"
conda activate viser/
dev=0
ngpu=1
batch_size=4

#dev=0,1,2,3
#ngpu=4
#batch_size=1

seed=1003
address=1111
logname=camel-$seed
checkpoint_dir=log

# optimize viser on a subset of video frames
# for camel we use start: 22, end: 42
dataname=camel-init
CUDA_VISIBLE_DEVICES=$dev python -m torch.distributed.launch \
    --master_port $address --nproc_per_node=$ngpu optimize.py \
    --name=$logname-0 --checkpoint_dir $checkpoint_dir --n_bones 21 --only_mean_sym \
    --num_epochs 20 --dataname $dataname --ngpu $ngpu --batch_size $batch_size --seed $seed
CUDA_VISIBLE_DEVICES=$dev python -m torch.distributed.launch \
    --master_port $address --nproc_per_node=$ngpu optimize.py \
    --name=$logname-1 --checkpoint_dir $checkpoint_dir --n_bones 36 --nosymmetric\
    --num_epochs 10 --dataname $dataname  --ngpu $ngpu --batch_size $batch_size \
    --model_path $checkpoint_dir/$logname-0/pred_net_latest.pth --finetune --n_faces 1601
CUDA_VISIBLE_DEVICES=$dev python -m torch.distributed.launch \
    --master_port $address --nproc_per_node=$ngpu optimize.py \
    --name=$logname-2 --checkpoint_dir $checkpoint_dir --n_bones 36 --nosymmetric\
    --num_epochs 10 --dataname $dataname  --ngpu $ngpu --batch_size $batch_size \
    --model_path $checkpoint_dir/$logname-1/pred_net_latest.pth  --finetune --n_faces 1602

# start-idx and end-idx are determined by the initization subset of frames
# delta-max-cap is computed as max( number-of-frames - end-idx, start-idx - 0)
dataname=camel
CUDA_VISIBLE_DEVICES=$dev python -m torch.distributed.launch \
    --master_port $address --nproc_per_node=$ngpu optimize.py \
    --name=$logname-ft1 --checkpoint_dir $checkpoint_dir --n_bones 36 --nosymmetric\
    --num_epochs 60 --dataname $dataname --ngpu $ngpu --batch_size $batch_size \
    --model_path $checkpoint_dir/$logname-2/pred_net_latest.pth --finetune --n_faces 1601 \
    # --start_idx 22 --end_idx 42 --use_inc --delta_max_cap 30 
CUDA_VISIBLE_DEVICES=$dev python -m torch.distributed.launch \
    --master_port $address --nproc_per_node=$ngpu optimize.py \
    --name=$logname-ft2 --checkpoint_dir $checkpoint_dir --n_bones 36 --nosymmetric\
    --num_epochs 20 --dataname $dataname  --ngpu $ngpu --batch_size $batch_size \
    --model_path $checkpoint_dir/$logname-ft1/pred_net_latest.pth --finetune --n_faces 8000

camel-init.config

[data]
datapath = database/DAVIS/JPEGImages/Full-Resolution/rcamel/
dframe = 1 
init_frame  = 0
end_frame = -1
can_frame = -1

[data_0]
datapath = database/DAVIS/JPEGImages/Full-Resolution/rcamel/
dframe = 1
init_frame  = 0
end_frame = -1
can_frame = -1
ppx = 960
ppy = 540

[meta]
numvid = 1

camel.config

[data]
datapath = database/DAVIS/JPEGImages/Full-Resolution/rcamel/
dframe = 1 
init_frame  = 0
end_frame = -1
can_frame = -1

[data_0]
datapath = database/DAVIS/JPEGImages/Full-Resolution/rcamel/
dframe = 1
init_frame  = 0
end_frame = -1
can_frame = -1
ppx = 960
ppy = 540

[meta]
numvid = 1
gengshan-y commented 2 years ago

For the breakdance example, I wasn't able to reproduce your failure case. There might something critically wrong. As a sanity check, can you run

bash scripts/render_result.sh breakdance-flare log/breakdance-flare-1003-1/pred_net_10.pth 36

just to check whether the initialization stage succeeded? You should be able to get something like this iTerm2 5rFWzc breakdance-flare-all

If it didn't, can you also send the tensorboard file log/breakdance-flare-1003-0/events.out.t* as well as the log/breakdance-flare-1003-0/pred_net_20.pth to me so that I can take a look?

phamtrongthang123 commented 2 years ago

Hi, here is the init. looks like it failed somehow. breakdance-flare-all Here is the drive link to log checkpoint and events for breakdance-flare-1003-0: https://drive.google.com/drive/folders/1PgkyGC_R2xwXwUrJp4Ayj0VOSI4p_PSd?usp=sharing

In case you want full result folder: https://drive.google.com/drive/folders/1R-uvI5lC4Tjg44TCdsPznUEQOo_yRQXw?usp=sharing

gengshan-y commented 2 years ago

From the tensorboard log, the pre-computed flow is wrong. Warped image is attached (left: correct, right: wrong) image

Can you check whether the pre-computed flow is the same as ones provided? An easy way to check is to compare the flow-warped images at database/DAVIS/FlowFW/Full-Resolution/rbreakdance-flare/warp*.jpg

phamtrongthang123 commented 2 years ago

The prepare files are correct. Below I show warp-00027.jpg from my folder and the given preprocess. Untitled-2022-05-06-2111

The files alone are correct, but the part it reads somehow it gets the wrong one (hence wrong one show in the tensorboard) i guess. But I didn't change anything in the code though. This is hard.

gengshan-y commented 2 years ago

Hi , just realized there was a bug in flow pre-processing script, where flow are stored in an upside-down way. This should fix the issue: https://github.com/gengshan-y/viser-release/commit/aceefb58fb705cc86be6a83f0297e9d054804619

phamtrongthang123 commented 2 years ago

Oh okay, I'm running the sample again after applying the fix. I will report the result after it's done.

phamtrongthang123 commented 2 years ago

Hi. The breakdance sample after the patch https://github.com/gengshan-y/viser-release/commit/aceefb58fb705cc86be6a83f0297e9d054804619 is super good. breakdance-flare-all

But for the camel sample it's still quite bad. I used the same camel.config as mentioned in previous reply. I wonder did I write the config wrong somehow? ezgif com-gif-maker

This is camel result from camel-1003-1 camel-all

And this is the link drive to event.* and checkpoint in camel-1003-0: https://drive.google.com/drive/folders/1FAXuqwcI6eBp1IFM3GGidg7LywKKbIMn?usp=sharing Look at the event file I don't see any suspicious, the 2d rendering output and loss looks legit so I don't know where to start debugging.

gengshan-y commented 2 years ago

The camel sequence has 90 frames, which is difficult to do gradient-based optimization from scratch. What we did in lasr/viser is to first optimize the subsampled frames (over time by a factor of 5), and then optimize all the frames.

This repo doesn't contain an example for camel but you may check this to generate flow of the subsampled frames, and modify the config according to configs/elephant-walk-init.config.

phamtrongthang123 commented 2 years ago

Thanks I'll do that. There is no other issue with the code so I'm closing this issue. Thanks for the great support!