Problem of training settings

lai0198848881 commented 1 year ago

Hi, I have some questions of your work. The paper stated that the total epoch of training is about 1000 but your work is 7000. So I change the max_epoch to 1000 but it generates blur images. So, do we really need to train for 7000 epoch to get promising results?

And I tried to change the batch_size and images_per_seq_options in order to fit my gpus memory. However, when I tried to generate some samples, it exists an error as shown below.

File "/home/lch/holo_diffusion/holo_diffusion/utils/render_utils/flyaround.py", line 373, in _get_dummy_test_batch_for_sampling R=torch.eye(3, device=device)[None].repeat(batch_size, 1, 1), RuntimeError: Trying to create tensor with negative dimension -1: [-1, 3, 3]

Thank you for your help!!

akanimax commented 1 year ago

Hey, please check the updated configs.

lai0198848881 commented 1 year ago

@akanimax Thank you for your help. I have run the training code with your object-specific configs, but I still meet with this issue. File "/home/lch/anaconda3/envs/holo_diffusion_release/lib/python3.9/site-packages/pytorch3d/implicitron/models/view_pooler/feature_aggregator.py", line 634, in _avgmaxstd_reduction_function x_aggr = torch.cat(pooled_features, dim=-1) RuntimeError: torch.cat(): expected a non-empty list of Tensors

akanimax commented 1 year ago

Hey, is this still an issue or have you figured out a fix? If the former, could you please provide the exact command which you ran to get this error? I'll try to reproduce it.

lai0198848881 commented 1 year ago

CUDA_VISIBLE_DEVICES=1 accelerate launch experiment.py --config-name apple.yaml @akanimax Hi, I run the following script with your specific object configs and it get me this error, but I don't have any issue when running with base.yaml config:

The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. /home/lch/holo_diffusion/experiment.py:319: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="./configs/", config_name="default_config") /home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. ret = run_job( Experiment! [2023-08-19 18:09:41,258][pytorch3d.implicitron.dataset.json_index_dataset][INFO] - Loading Co3D frames from /home/lch/co3dv2/apple/frame_annotations.jgz. [2023-08-19 18:09:56,369][pytorch3d.implicitron.dataset.json_index_dataset][INFO] - Loading Co3D sequences from /home/lch/co3dv2/apple/sequence_annotations.jgz. [2023-08-19 18:09:56,378][pytorch3d.implicitron.dataset.json_index_dataset][INFO] - Loading Co3D subset lists from . [2023-08-19 18:09:56,379][pytorch3d.implicitron.dataset.json_index_dataset][INFO] - Removing images with empty masks. [2023-08-19 18:09:56,454][pytorch3d.implicitron.dataset.json_index_dataset][INFO] - ... filtered 163262 -> 161974 [2023-08-19 18:09:56,804][pytorch3d.implicitron.dataset.json_index_dataset][INFO] - JsonIndexDataset #frames=92312 [2023-08-19 18:09:56,805][pytorch3d.implicitron.dataset.json_index_dataset_map_provider_v2][INFO] - Loading frame index json from /home/lch/co3dv2/apple/set_lists/set_lists_fewview_dev.json. [2023-08-19 18:09:57,339][pytorch3d.implicitron.dataset.json_index_dataset_map_provider_v2][INFO] - Loading frame index json from /home/lch/co3dv2/apple/eval_batches/eval_batches_fewview_dev.json. [2023-08-19 18:10:03,045][pytorch3d.implicitron.dataset.json_index_dataset_map_provider_v2][INFO] - Train dataset: JsonIndexDataset #frames=42365 /home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet34_Weights.IMAGENET1K_V1. You can also use weights=ResNet34_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) [2023-08-19 18:10:04,193][pytorch3d.implicitron.models.feature_extractor.resnet_feature_extractor][INFO] - Feat extractor total dim = 68 [2023-08-19 18:10:06,663][pytorch3d.implicitron.models.generic_model][INFO] - ------- loss_weights: loss_rgb_mse : 1.00e+00 loss_prev_stage_rgb_mse : 1.00e+00 loss_prev_stage_prev_stage_rgb_mse : 1.00e+00 loss_prev_stage_prev_stage_prev_stage_rgb_mse: 1.00e+00 loss_prev_stage_prev_stage_prev_stage_prev_stage_rgb_mse: 1.00e+00 loss_mask_bce : 0.00e+00 loss_prev_stage_mask_bce : 0.00e+00 loss_prev_stage_prev_stage_mask_bce : 0.00e+00------- /home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/torch/nn/modules/lazy.py:180: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. warnings.warn('Lazy modules are a new feature under heavy development ' /home/lch/holo_diffusion/holo_diffusion/holo_diffusion_model.py:114: UserWarning: Setting target view exclusion to False by hard! warnings.warn("Setting target view exclusion to False by hard!") [2023-08-19 18:10:06,772][main][INFO] - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: no

[2023-08-19 18:10:06,842][main][INFO] - Seed = 42 [2023-08-19 18:10:06,843][main][INFO] - Running experiment on device: cuda:0 [2023-08-19 18:10:09,648][trainer.optimizer_factory][INFO] - Solver type = Adam /home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/pytorch3d/implicitron/models/utils.py:72: UserWarning: Thresholding masks! warnings.warn("Thresholding masks!") /home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/pytorch3d/implicitron/models/utils.py:77: UserWarning: Masking images! warnings.warn("Masking images!") Error executing job with overrides: [] Traceback (most recent call last): File "/home/lch/holo_diffusion/experiment.py", line 339, in experiment() File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/internal/hydra.py", line 132, in run = ret.return_value File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/home/lch/holo_diffusion/experiment.py", line 335, in experiment experiment.run() File "/home/lch/holo_diffusion/experiment.py", line 241, in run model( File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/lch/holo_diffusion/holo_diffusion/holo_diffusion_model.py", line 358, in forward voxel_features = self.view_pooler( File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/pytorch3d/implicitron/models/view_pooler/view_pooler.py", line 120, in forward feats_aggregated = self.feature_aggregator( # noqa: E731 File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/pytorch3d/implicitron/models/view_pooler/feature_aggregator.py", line 333, in forward feats_aggregated = { File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/pytorch3d/implicitron/models/view_pooler/feature_aggregator.py", line 334, in k: _avgmaxstd_reduction_function( File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/pytorch3d/implicitron/models/view_pooler/feature_aggregator.py", line 634, in _avgmaxstd_reduction_function x_aggr = torch.cat(pooled_features, dim=-1) RuntimeError: torch.cat(): expected a non-empty list of Tensors Traceback (most recent call last): File "/home/lch/anaconda3/envs/holo_diffusion/bin/accelerate", line 10, in sys.exit(main()) File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/accelerate/commands/launch.py", line 918, in launch_command simple_launcher(args) File "/home/lch/anaconda3/envs/holo_diffusion/lib/python3.9/site-packages/accelerate/commands/launch.py", line 580, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/lch/anaconda3/envs/holo_diffusion/bin/python3.9', 'experiment.py', '--config-name', 'apple.yaml']' returned non-zero exit status 1.

akanimax commented 1 year ago

Okay I see, feature_aggregator config is not getting correctly parsed. Need to fix this.

facebookresearch / holo_diffusion

Problem of training settings #4