16lemoing / waldo

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
MIT License
14 stars 0 forks source link

Query on running demo #5

Closed yunhhp12 closed 11 months ago

yunhhp12 commented 11 months ago

Hi, thank you for the impressive work.

When I was trying to run the demo with all the demo dataset and checkpoint files as you mentioned, I got the following error and am now stuck. Do you have a solution for this??

Thanks in advance!

============================================================ helpers/synthesizer_evaluator.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-15_18:09:46 host : workspace-lc8fxuxlu2wx-0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 5474) error_file: /tmp/torchelastic_y89v27vm/none_oy5iba_2/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/root/waldo/helpers/synthesizer_evaluator.py", line 83, in main SynthesizerEvaluator(opt).run() File "/root/waldo/helpers/synthesizer_evaluator.py", line 36, in run with Engine(self.opt) as engine: File "/root/waldo/tools/engine.py", line 23, in __init__ node_id = int(os.environ['SLURM_NODEID']) File "/root/anaconda3/envs/waldo/lib/python3.9/os.py", line 679, in __getitem__ raise KeyError(key) from None KeyError: 'SLURM_NODEID' ============================================================
16lemoing commented 11 months ago

Hi, thanks for your interest.

This error is raised because the training/inference code was written to be executed with the Slurm job scheduler. I updated the code so that it is no longer a requirement. Let me know if it solves your issue.

yunhhp12 commented 11 months ago

Thank you for quick response.

I checked it and it does not generate the same issue. However, my os can't seem to find 'self.opt.local_rank' in line 27 of engine.py (L27 self.global_rank = self.opt.local_rank)

16lemoing commented 11 months ago

You are right, LOCAL_RANK should be read from the environment variable in this case. I have updated engine.py

yunhhp12 commented 11 months ago

Thanks again for the prompt response.

The code seems to be working thank you very much. But when I run either demo.sh or test.sh, the scripts on cityscape dataset work, but give following results which are different from yours and look like things suddenly become frozen after ~1 sec

cityscpae demo:

https://github.com/16lemoing/waldo/assets/85543623/29b725d7-c943-49f1-9643-ac147272d1cf

inp_pred_vid (vid_00000):

https://github.com/16lemoing/waldo/assets/85543623/6d5bdfc7-e7a9-4cb7-81d4-fbf63f66f0fb

inp_pred_vid (vid_00001):

https://github.com/16lemoing/waldo/assets/85543623/2022a363-0344-43a7-8d40-1d475627590b

I ran the codes with single tesla v100.

As for the kitti dataset, scripts give following error:

============================================================ helpers/synthesizer_evaluator.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-18_12:53:31 host : workspace-lc8fxuxlu2wx-0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 27025) error_file: /tmp/torchelastic_xxqr5qlg/none_x_310xlz/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper return f(*args, **kwargs) File "/root/waldo/helpers/synthesizer_evaluator.py", line 83, in main SynthesizerEvaluator(opt).run() File "/root/waldo/helpers/synthesizer_evaluator.py", line 66, in run for tmp_iter, vid_data in enumerate(self.eval_vid_data_info["loader_iter"]): File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in __next__ data = self._next_data() File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise raise exception AssertionError: Caught AssertionError in DataLoader worker process 0. Original Traceback (most recent call last): File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/anaconda3/envs/waldo/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/waldo/data/base_dataset.py", line 256, in __getitem__ assert len(frame_paths) >= frames_per_clip, f"{frame_paths}, {frames_per_clip}" AssertionError: ['datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000003.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000004.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000005.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000006.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000007.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000008.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000009.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000010.png', 'datasets/demo_kitti/all_vid_256/test/2011_09_26_drive_0060_sync/image_02/data/0000000011.png'], 10 ============================================================ I am guessing this has to do with --vid_len option, but not sure how I should manipulate it to fix this.
16lemoing commented 11 months ago

The demo should run fine on a V100 GPU.

For the Cityscapes demo, it is hard to help you without any error message. It could be anything from failing to load the checkpoints / the data. Maybe if you send me the log I could have a better idea of what went wrong. I have never seen any video freezing like that.

For the KITTI demo, it says that it found fewer than 10 frames in the video folder (actually it should contain ~30 if I remember well). You shouldn't need to change the --vid_len option.

16lemoing commented 11 months ago

Sorry for the misunderstanding. I meant the text output printed to screen when you run the demo.

yunhhp12 commented 11 months ago

Ahh got it. Here is the full text output printed on terminal.

-------------------------------------------- Base Options -------------------------------------------- aspect_ratio: 2
aug_alpha: 2.0
aug_max_mask: 0.0
aug_max_rotate: 0.3
aug_max_translate: 0.3
aug_max_zoom: 1.3
aug_min_mask: 0.0
aug_min_rotate: -0.3
aug_min_translate: -0.3
aug_padding_mode: zeros
aug_rd_fill: False
aug_sigma: 0.2
batch_size_img: 1
batch_size_vid: 1
bg_idx: [1, 2, 3, 10, 11]
categories: None
centered_crop: False
colorjitter: 0.5 [default: None] colorjitter_no_contrast: True [default: False] compute_fid: False
compute_fvd: False
cont_train: False
data_specs: None
dataroot: datasets/demo_cityscapes [default: datasets/cityscapes] dataset: cityscapes [default: bair] datetime: 2023-10-18-14:22:00 [default: None] dim: 128 [default: 512] eval_phase: test [default: valid] fg_idx: [0, 4, 5, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19] fixed_crop: None
fixed_top_centered_zoom: None
flow_dim: 128 [default: 0] flow_model: raft
force_compute_metadata: False
fps: 10
from_animation: False
from_vid: False
gpu_ids: 0
horizontal_centered_crop: False
imagenet_norm: False
img_acc_nums: [1]
img_metric:
img_modes: ['None']
img_skip_nums: [1]
init_fold_test: None
init_fold_train: None
init_fold_valid: None
input_ratio: 1.0
is_tar: False
is_vid: False
load_100: False
load_2_apart: False
load_all: False
load_data: False
load_dim: 512 [default: 0] load_flow: True [default: False] load_from_opt_file: False
load_lyt: True [default: False] load_n_from_tar: 1
load_n_plus_1: False
load_n_rd: False
load_signature:
load_vid_len: None
log_fps: 4
log_freq: 10000 [default: None] lyt_model: deeplabv3
max_batch_eval_img: None
max_batch_eval_vid: None
max_rd_len: 1
max_vid_step: 1000
max_zoom: 1.3 [default: 1.0] min_rd_len: 1
min_zoom: 1.0
name: demo_cityscapes [default: None] no_h_flip: True
no_v_flip: False [default: True] num_folds_test: None
num_folds_train: None
num_folds_valid: None
num_iter: 1000000 [default: 1000] num_iter_eval: 10000 [default: None] num_lyt: 20
num_workers: 8
num_workers_eval: 1 [default: None] one_every_n: 1
original_size: None
other_idx: [9]
palette: [0, 0, 0, 128, 64, 128, 244, 35, 232, 70, 70, 70, 102, 102, 156, 190, 153, 153, 153, 153, 153, 250, 170, 30, 220, 220, 0, 107, 142, 35, 152, 251, 152, 70, 130, 180, 220, 20, 60, 255, 0, 0, 0, 0, 142, 0, 0, 70, 0, 60, 100, 0, 80, 100, 0, 0, 230, 119, 11, 32] random_fold_train: False
rd_len: False
remap_lyt: [13, 19, 18, 19, 7, 6, 8, 6] [default: []] resize_center_crop_img: None
resize_img: None
rotate: 0
save_data: False
save_freq: -1
save_latest_freq: 1000 [default: 5000] save_path: ./
shuffle_valid: False
single_digit: False
skip_first: True [default: False] tps_aug_circle_alpha: 0.1
tps_aug_circle_kernel: 5
tps_aug_mask_kernel: 13
tps_aug_max_delta: 0.0
tps_aug_max_delta_bg: 0.0
tps_aug_max_mask: 0.0
tps_aug_max_scale: 0.5
tps_aug_max_scale_bg: 1.0
tps_aug_max_translate: 0.0
tps_aug_min_mask: 0.0
tps_aug_min_scale: 0.5
tps_aug_min_scale_bg: 1.0
tps_aug_min_translate: 0.0
tps_aug_simulate_obj: False
true_dim: 512 [default: 1024] true_ratio: 2
update_tar_every_n: 1
use_amp: False
use_aug_img: False
use_tps_aug_img: False
vid_acc_nums: [1]
vid_len: 14 [default: 16] vid_metric:
vid_modes: ['vid_prediction'] [default: ['None']] vid_skip: 1
vid_skip_nums: [1]
vid_step_every: 1
---------------------------------------------- Base End ----------------------------------------------

----------------------------------------- Synthesizer Options ----------------------------------------- ada_pts_rest: False
ada_pts_rest_detach: False
allow_ghost: False
alpha_norm: 0
aug_policy: ['None']
beta1: 0.0
beta2: 0.99
bg_color: black
bg_mul: 1.2 [default: 1.0] bg_mul_pose_decoder: 1.2 [default: 1.0] blur_alpha: False
blur_delta: False
blur_edge: True [default: False] blur_in: False
blur_pxl: True [default: False] blur_sigma: 2.0 [default: 3.0] bound_alpha: True [default: False] bound_rest: True [default: False] bound_scale: False
cap_dim: 768
cat_z: True
cell_dis_eps: 0.0
circle_translate_bias: True [default: False] circle_translate_radius: 0.2 [default: 0.25] clip_value: 0
codebook_dim: 0
codebook_size: 256
com_depth: 1
commit_latent: False
cosine_warmup_pxl_vid: False
ctx_len: 4 [default: 10] ctx_mode: prev [default: full] debug: False
dec_depth: 4
decompose_embed_oe: False
dis: temporal
dis_cls_depth: 1
dis_depth: 7
dis_flow_mode: identity
dis_latent_mode: identity
dis_on_rec: False
dis_spatial_stddev_group: 4
dis_spectral_norm_layer: None
dis_stddev_group: 4
dis_use_spatial_stddev: False
dis_use_stddev: False
drop_input_p: 0.0
drop_quant: 0
dropout: 0
edge_size: 15 [default: 7] ema_beta: 0.995
ema_freq: 1
ema_networks: []
embed_dim: 512
enc_depth: 4
fill_mask: False
filter_alpha: False
fix_bg: False
fix_bg1: False
fix_mask: False
fix_thresh: False
flow_thresh: 0.02 [default: 0.01] freeze_obj: False
from_multi: False
gan: vivit
gan_loss: hinge
gen_attn_mode: dp
gen_depth: 8
gen_mapping: False
gen_noise: False
gen_noise_modulation: False
gen_random_obj_embed: False
has_bg: True [default: False] head_scale: 1.0
hr_ratio: 1
ii_ab: True [default: False] ii_depth: 6 [default: 4] ii_embed_dim: 512
ii_ft_hd: False
ii_iter: latest [default: None] ii_last_only: False
ii_load_path: checkpoints/2022-05-18-18:39:55-train_wif_cityscapes [default: None] ii_score: True [default: False] ii_upmode: bilinear
img_autoencoder_losses: ['pxl', 'qnt']
img_generator_losses: ['pxl_ctx', 'pxl_gen']
img_mul_act_reg: 1.0
img_object_extractor_losses: ['pxl_vid']
img_pose_extractor_losses: ['pxl_ctx']
img_reg_every: 16
include_self: False
init_scale_obj: 0.25 [default: 1.0] inpaint_obj: True [default: False] inpainter_path: checkpoints/mat/Places_512_FullData.pkl [default: None] input_flow: True [default: False] input_lyt: True [default: False] input_rgb: False [default: True] interpolate_grid: False
l1_pxl: True [default: False] lambda0_pxl_obj_alpha: 0.1
lambda_abs_mov: 1
lambda_activity: 1
lambda_adv: 1
lambda_bound_spread: 1
lambda_ce_lyt: 1
lambda_ce_lyt_obj: 1
lambda_ce_mean_lyt: 1
lambda_ce_mean_lyt_obj: 1
lambda_cell_dis: 10.0 [default: 1] lambda_center_dis: 1
lambda_cls_obj_ctx: 1
lambda_cls_obj_gen: 1
lambda_cluster_dis: 1
lambda_collapse: 1
lambda_color_diversity: 1
lambda_conf: 1
lambda_delta: 1
lambda_delta_bg_pose: 1
lambda_delta_pose: 1
lambda_dis: 1
lambda_dis_pred: 1
lambda_ent: 1
lambda_ent_flt: 1
lambda_ent_flt_edge: 1
lambda_entropy: 1
lambda_expansion: 1
lambda_extent: 1
lambda_fg_tube_ent: 1
lambda_flow: 1
lambda_inter_obj: 1
lambda_iou_obj: 1
lambda_l1_flow: 100.0 [default: 1] lambda_l1_flow_mov_obj: 1
lambda_l1_flow_other: 1
lambda_lpips_vid: 1
lambda_match_bg_alpha_post: 1
lambda_match_bg_alpha_pre: 1
lambda_obj_flow: 1
lambda_pred_latent_ctx: 1
lambda_pts_dis: 1
lambda_pts_reg: 1
lambda_pts_rest: 20.0 [default: 1] lambda_push_obj_pose: 1
lambda_push_raw_vid: 1
lambda_pxl: 1
lambda_pxl_aug_ctx: 1
lambda_pxl_bg: 1
lambda_pxl_comp: 1
lambda_pxl_conf_ctx: 1
lambda_pxl_ctx: 1
lambda_pxl_gen: 1
lambda_pxl_inter_bg: 1
lambda_pxl_inter_obj: 1
lambda_pxl_mean_vid: 1
lambda_pxl_obj: 1
lambda_pxl_obj_alpha: 1
lambda_pxl_raw_bg: 1
lambda_pxl_raw_fg: 1
lambda_pxl_raw_obj: 1
lambda_pxl_reg_bg: 1
lambda_pxl_reg_fg: 1
lambda_pxl_swap: 1
lambda_pxl_vid: 1
lambda_pxl_warp_ctx: 1
lambda_qnt: 1
lambda_r1: 10
lambda_rec_alpha_obj: 1
lambda_rec_bg_pose: 1
lambda_rec_edge: 1
lambda_rec_edge_obj: 1
lambda_rec_edge_vid: 1
lambda_rec_obj_pose: 1
lambda_rec_occ_score: 0.01 [default: 1] lambda_rec_pred: 1
lambda_rec_swap_pose: 1
lambda_reg_alpha: 1
lambda_reg_color: 1
lambda_reg_edge: 1
lambda_reg_fg: 1
lambda_reg_kcenters: 1
lambda_reg_map_ctx: 1
lambda_reg_map_gen: 1
lambda_reg_mov: 10.0 [default: 1] lambda_reg_obj_ctx: 1
lambda_reg_obj_gen: 1
lambda_reg_obj_pose: 1
lambda_reg_raw_bg: 1
lambda_reg_raw_vid: 1
lambda_reg_raw_vid_plus: 1
lambda_reg_raw_vid_plus2: 1
lambda_reg_raw_vid_plus_mean: 1
lambda_reg_surface: 1
lambda_scale: 1
lambda_sharp_vid: 1
lambda_shift_penalty: 1
lambda_soft_ce_lyt: 1
lambda_soft_ce_mean_lyt: 1
lambda_spread: 1
lambda_triv_flow: 1
lambda_tube_ent: 1
lambda_vgg: 10
lambda_vgg_ctx: 10
lambda_vgg_gen: 10
lambda_warped_latent_ctx: 1
last_n_ctx: 0
latent_shape: [8, 16] [default: [4, 8]] load_path: checkpoints/2022-05-16-12:10:29-train_lvd_cityscapes [default: None] loop_ii: True [default: False] lr: 0.0001
mask_input: False
max_bg_alpha: 1
max_ctx_length_img: 16
max_ctx_length_vid: 16
max_delta_match: 0.15
max_obj_shift: 0.04
max_scale: 2
max_scale_bound: 0.5
max_spread: 0.5
max_translate_bound: 0.5
mean_obj_alpha: 0
mean_vid_bg_mode: ['norm']
min_bg_alpha: 0.0001
min_cls: 0.1 [default: 0.001] min_conf: 0
min_ctx_length_img: 0
min_ctx_length_vid: 0
min_delta_match: 0.1
min_obj_shift: 0.0
min_scale: 0
min_scale_bound: -0.5
min_spread: -0.5
mov_obj_thresh: 0.005 [default: 0.02] mul_delta_obj: 0.2 [default: 1.0] mul_end_iter: 0
mul_scale_obj: 0.25 [default: 1.0] mul_start_iter: 0
no_ctx_fake: False
no_filter: False
no_future: False
nobg_edge_mul: 0
norm_layer: ln
norm_layer_patch: ln2d
norm_scale: False
normalize_alpha: False
not_strict: True [default: False] num_captions: 0
num_expansion: 2
num_heads: 8
num_obj: 16 [default: 1] num_perm_grid: 1
num_timesteps: 5 [default: 16] obj_discovery_mode: []
obj_shape: [4, 4]
occ_mode:
oe_depth: 2 [default: 8] oe_freeze_iter: 0
oe_init_mode:
oe_num_timesteps: 5
oe_pts_mode: prior
oe_use_decoder: False [default: True] optimizer: adam
pad_bg_alpha: 3 [default: 0] pad_obj_alpha: 3 [default: 0] patch_size: 16 [default: 8] pd_com_depth: 2
pd_enc_depth: 4
pe_decoder_init_mode: five [default: ] pe_decoder_use_prior: False
pe_depth: 2 [default: 8] pe_estimator_init_mode: zero
pe_filter_blur: False
pe_filter_order: 1
pe_post_refiner_depth: 2
pe_pts_mode: prior
pe_refiner_blend_mode_bg:
pe_refiner_blend_mode_obj:
pe_refiner_depth: 2
pe_refiner_init_mode: mfive [default: ] pe_refiner_mode: ['comp']
pe_repeat_border: False
pe_use_edge_filter: False
pe_use_post_refiner: False
pe_use_refiner: False
pe_use_scorer: True [default: False] pg_batch_size_mul: 1
pg_com_depth: 2
pg_dec_depth: 4
pg_depth: 6
pg_embed_noise: False
pg_enc_depth: 4
pg_inject_noise: False
pg_iter: latest [default: None] pg_load_path: checkpoints/2022-05-17-10:46:43-train_flp_cityscapes [default: None] pg_modulate_noise: False
pg_num_timesteps: 14 [default: 5] pg_pts_mode: prior
pg_simple: False
pg_simple_head: False
post_refine_mean_vid: False
pred_cls: True [default: False] progressive_scale: False
propagate_obj: True [default: False] propagate_unique: True [default: False] rd_ctx_num: 1
rd_translate_bias: False
reg_bg_mul: 0.25
reg_raw_vid_subidx: [0, 2, 6]
remove_obj: False
restrict_to_ctx: True [default: False] scale_factor: 1
shuffle_bg: False
soft_bound_rest: True [default: False] soft_shadow: True [default: False] split_pred_ts: False
style_embed_dim: 128
style_embed_mul: [1, 1, 2, 2, 2]
style_latent_shape: [4, 4]
swap: False
swap2: False
swap_flt: True [default: False] swap_p: 0
tgt_scale: 1
time_dropout: False
translate_bias_mul: 1
unc_arch: mlp
unc_drop_obj: 0
unc_init: False
unc_mode_img: obj_to_1
unc_mode_vid: obj_plus_n_to_1
unc_n: 2
unc_non_trivial: False
unc_obj_mode: code
unc_temporal_dropout: 0
unconstrained_pose_decoder: True [default: False] use_adaptive_lambda: False
use_b: False
use_d: False
use_delta: True
use_disocc: False
use_dominant_flow_other: True [default: False] use_expansion: True [default: False] use_fg: True [default: False] use_flow_nobg: False
use_hr: False
use_id: False
use_ii: True [default: False] use_inpainter: True [default: False] use_last_pose_decoder: True [default: False] use_latent_norm: False
use_layout: False
use_lyt_filtering: True [default: False] use_lyt_opacity: True [default: False] use_mat_inpainter: True [default: False] use_nobg: False
use_nobg_edge: False
use_od: False
use_oe: False
use_og: False
use_pd: False
use_pe: True [default: False] use_pg: True [default: False] use_shadows: True [default: False] use_soft_bg: False
use_te: False
vid_autoencoder_losses: ['pxl_ctx', 'pxl_gen']
vid_generator_losses: ['pxl_ctx', 'pxl_gen']
vid_inpainting_losses: ['sharp_vid']
vid_object_extractor_losses: ['pxl_vid']
vid_pose_extractor_losses: ['pxl_ctx']
vid_pose_generator_losses: ['rec_obj_pose']
viz: False
warmup_bg_iter: 0
warmup_bg_score_iter: 0
warmup_l1_flow_iter: 0
warmup_l1_flow_mul: 100
warmup_obj_score_iter: 0
warmup_pxl_bg_iter: 0
warmup_pxl_obj_iter: 0
warmup_pxl_vid_iter: 0
warmup_reg_mov_iter: 0
warmup_reg_mov_mul: 100
warmup_reg_raw_vid_iter: 0
warmup_reg_raw_vid_mul: 100
warmup_reg_raw_vid_plus_iter: 0
warmup_sharp_vid_iter: 0
wd: 0.0
weight_cls: True [default: False] which_iter: latest [default: 0] zero_init_dec: True
------------------------------------------- Synthesizer End -------------------------------------------

Error file: /tmp/torchelastic_thloledh/none_maboaqke/attempt_0/0/error.json Initializing node 1 / 1, rank 1 (local) 1 (global) / 1 Creation of dataset [CityscapesDataset-test] of size 1 loading vid data from img data: rgb [y] No checkpoint for pe net with name latest and path checkpoints/2022-05-16-12:10:29-train_lvd_cityscapes Loading untrained pe net No checkpoint for pg net with name latest and path checkpoints/2022-05-17-10:46:43-train_flp_cityscapes Loading untrained pg net No checkpoint for ii net with name latest and path checkpoints/2022-05-18-18:39:55-train_wif_cityscapes Loading untrained ii net DistributedDataParallel( (module): LVD( (encoder): ImageEncoder( (from_img): ConvPatchProj( (layers): ModuleList( (0): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(128, 128, eps=1e-05, affine=True) ) (2): GELU() ) (1): Sequential( (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(256, 256, eps=1e-05, affine=True) ) (2): GELU() ) (2): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) ) (proj): Conv2d(22, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) ) ) (layer_estimator): LayerEstimator( (norm): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (blocks): MultiBlocks( (multi_blocks): ModuleList( (0): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): ObjAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (1): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): ObjAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) ) ) (cls_norm): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (cls_head): Linear(in_features=512, out_features=20, bias=True) ) (pose_estimator): PoseEstimator( (blocks): MultiBlocks( (multi_blocks): ModuleList( (0): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (1): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) ) ) (norm): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (head): Linear(in_features=512, out_features=9, bias=True) ) (warper): Warper( (tps_obj): TPSWarp() (invert_obj): InverseWarp() (tps_bg): TPSWarp() (invert_bg): InverseWarp() ) (decoder): ImageDecoder( (norm): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (to_img): ConvPatchProj( (layers): ModuleList( (0): Sequential( (0): ConvTranspose2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(256, 256, eps=1e-05, affine=True) ) (2): GELU() ) (1): Sequential( (0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(128, 128, eps=1e-05, affine=True) ) (2): GELU() ) (2): Sequential( (0): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(64, 64, eps=1e-05, affine=True) ) (2): GELU() ) ) (proj): ConvTranspose2d(64, 1, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) ) ) ) ) Total number of parameters: 16006749 DistributedDataParallel( (module): FLP( (compress): LatentCompressor( (norm): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (blocks): MultiBlocks( (multi_blocks): ModuleList( (0): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): ClsAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (1): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): ClsAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) ) ) ) (encode): PoseEncoder( (to_obj_emb): Linear(in_features=33, out_features=512, bias=True) (to_bg_emb): Linear(in_features=256, out_features=512, bias=True) (blocks): MultiBlocks( (multi_blocks): ModuleList( (0): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (1): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (2): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (3): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) ) ) (norm): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) ) (decode): PoseDecoder( (self_blocks): ModuleList( (0): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (1): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (2): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (3): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): FullAttention( (qkv): Linear(in_features=512, out_features=1536, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) ) (cross_blocks): ModuleList( (0): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): CrossAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (1): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): CrossAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (2): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): CrossAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) (3): Block( (norm1): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (attn): CustomAttention( (attn): CrossAttention( (q): Linear(in_features=512, out_features=512, bias=False) (kv): Linear(in_features=512, out_features=1024, bias=False) (proj): Linear(in_features=512, out_features=512, bias=True) ) ) (norm2): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (mlp): Mlp( (fc1): Linear(in_features=512, out_features=2048, bias=True) (act): GELU() (fc2): Linear(in_features=2048, out_features=512, bias=True) ) ) ) (norm): CustomNorm( (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (obj_head): Linear(in_features=512, out_features=39, bias=True) (bg_head): Linear(in_features=512, out_features=262, bias=True) ) ) ) Total number of parameters: 44435245 DistributedDataParallel( (module): WIF( (unet): UNet( (to_emb): Conv2d(40, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (from_emb): Conv2d(32, 5, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (conv_layers): ModuleList( (0): Sequential( (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(32, 32, eps=1e-05, affine=True) ) (2): GELU() ) (1): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(64, 64, eps=1e-05, affine=True) ) (2): GELU() ) (2): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(128, 128, eps=1e-05, affine=True) ) (2): GELU() ) (3): Sequential( (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(256, 256, eps=1e-05, affine=True) ) (2): GELU() ) (4): Sequential( (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(512, 512, eps=1e-05, affine=True) ) (2): GELU() ) (5): Sequential( (0): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(1024, 1024, eps=1e-05, affine=True) ) (2): GELU() ) ) (deconv_layers): ModuleList( (0): Sequential( (0): ConvTranspose2d(64, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(16, 16, eps=1e-05, affine=True) ) (2): GELU() ) (1): Sequential( (0): ConvTranspose2d(128, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(32, 32, eps=1e-05, affine=True) ) (2): GELU() ) (2): Sequential( (0): ConvTranspose2d(256, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(64, 64, eps=1e-05, affine=True) ) (2): GELU() ) (3): Sequential( (0): ConvTranspose2d(512, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(128, 128, eps=1e-05, affine=True) ) (2): GELU() ) (4): Sequential( (0): ConvTranspose2d(1024, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(256, 256, eps=1e-05, affine=True) ) (2): GELU() ) (5): Sequential( (0): ConvTranspose2d(1024, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1), bias=False) (1): CustomNorm( (norm): GroupNorm(512, 512, eps=1e-05, affine=True) ) (2): GELU() ) ) ) ) ) Total number of parameters: 14164416 Setting up PyTorch plugin "bias_act_plugin"... Done. Setting up PyTorch plugin "upfirdn2d_plugin"... Done. Evaluation was successfully finished.

Thanks in advance!

16lemoing commented 11 months ago

Ok, I have found the error.

No checkpoint for pe net with name latest and path checkpoints/2022-05-16-12:10:29-train_lvd_cityscapes
Loading untrained pe net
No checkpoint for pg net with name latest and path checkpoints/2022-05-17-10:46:43-train_flp_cityscapes
Loading untrained pg net
No checkpoint for ii net with name latest and path checkpoints/2022-05-18-18:39:55-train_wif_cityscapes
Loading untrained ii net

The model failed to load the checkpoints. Please make sure you have a checkpoints folder in the root directory of the project and that it contains the folder for each of the models mentioned above.

yunhhp12 commented 11 months ago

Got it. I checked the folder name and the path written on the demo.sh and test.sh and noticed 2 differences in punctuation mark. Corrected those and I got the similar predicted result as you did.

Thank you very much for your help and I will close this issue 👍