JonasSchult / Mask3D

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
MIT License
528 stars 105 forks source link

issue when trainning on S3DIS #83

Closed jerry3chen closed 1 year ago

jerry3chen commented 1 year ago

Hi, I tried to train S3DIS dataset using the given config and sh file with only change in where the dataset is saved. However I run into following error: /home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/utilities/seed.py:55: UserWarning: No seed found, seed set to 767304311 rank_zero_warn(f"No seed found, seed set to {seed}") Global seed set to 767304311 {'_target_': 'pytorch_lightning.loggers.WandbLogger', 'project': '${general.project_name}', 'name': '${general.experiment_name}', 'save_dir': '${general.save_dir}', 'entity': 'schult', 'resume': 'allow', 'id': '${general.experiment_name}'} wandb: (1) Create a W&B account wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results wandb: Enter your choice: 3 wandb: You chose "Don't visualize my results" wandb: WARNINGresumewill be ignored since W&B syncing is set tooffline. Starting a new run with run id area1_from_scratch. wandb: Tracking run with wandb version 0.15.2 wandb: W&B syncing is set toofflinein this directory. wandb: Runwandb onlineor set WANDB_MODE=online to enable cloud syncing. [2023-05-16 18:47:22,893][__main__][INFO] - {'general_train_mode': True, 'general_task': 'instance_segmentation', 'general_seed': None, 'general_checkpoint': None, 'general_backbone_checkpoint': None, 'general_freeze_backbone': False, 'general_linear_probing_backbone': False, 'general_train_on_segments': False, 'general_eval_on_segments': False, 'general_filter_out_instances': False, 'general_save_visualizations': False, 'general_visualization_point_size': 20, 'general_decoder_id': -1, 'general_export': False, 'general_use_dbscan': False, 'general_ignore_class_threshold': 100, 'general_project_name': 's3dis', 'general_workspace': 'jonasschult', 'general_experiment_name': 'area1_from_scratch', 'general_num_targets': 14, 'general_add_instance': True, 'general_dbscan_eps': 0.95, 'general_dbscan_min_points': 1, 'general_export_threshold': 0.0001, 'general_reps_per_epoch': 1, 'general_on_crops': False, 'general_scores_threshold': 0.0, 'general_iou_threshold': 1.0, 'general_area': 1, 'general_eval_inner_core': -1, 'general_topk_per_image': 100, 'general_ignore_mask_idx': [], 'general_max_batch_size': 99999999, 'general_save_dir': 'saved/area1_from_scratch', 'general_gpus': 1, 'data_train_mode': 'train', 'data_validation_mode': 'validation', 'data_test_mode': 'validation', 'data_ignore_label': 255, 'data_add_raw_coordinates': True, 'data_add_colors': True, 'data_add_normals': False, 'data_in_channels': 3, 'data_num_labels': 13, 'data_add_instance': True, 'data_task': 'instance_segmentation', 'data_pin_memory': False, 'data_num_workers': 4, 'data_batch_size': 4, 'data_test_batch_size': 1, 'data_cache_data': False, 'data_voxel_size': 0.02, 'data_reps_per_epoch': 1, 'data_cropping': False, 'data_cropping_args_min_points': 30000, 'data_cropping_args_aspect': 0.8, 'data_cropping_args_min_crop': 0.5, 'data_cropping_args_max_crop': 1.0, 'data_crop_min_size': 20000, 'data_crop_length': 6.0, 'data_cropping_v1': True, 'data_train_dataloader__target_': 'torch.utils.data.DataLoader', 'data_train_dataloader_shuffle': True, 'data_train_dataloader_pin_memory': False, 'data_train_dataloader_num_workers': 4, 'data_train_dataloader_batch_size': 4, 'data_validation_dataloader__target_': 'torch.utils.data.DataLoader', 'data_validation_dataloader_shuffle': False, 'data_validation_dataloader_pin_memory': False, 'data_validation_dataloader_num_workers': 4, 'data_validation_dataloader_batch_size': 1, 'data_test_dataloader__target_': 'torch.utils.data.DataLoader', 'data_test_dataloader_shuffle': False, 'data_test_dataloader_pin_memory': False, 'data_test_dataloader_num_workers': 4, 'data_test_dataloader_batch_size': 1, 'data_train_dataset__target_': 'datasets.semseg.SemanticSegmentationDataset', 'data_train_dataset_dataset_name': 's3dis', 'data_train_dataset_data_dir': '/data/share/3D/s3disMask3d', 'data_train_dataset_image_augmentations_path': 'conf/augmentation/albumentations_aug.yaml', 'data_train_dataset_volume_augmentations_path': 'conf/augmentation/volumentations_aug.yaml', 'data_train_dataset_label_db_filepath': '/data/share/3D/s3disMask3d/label_database.yaml', 'data_train_dataset_color_mean_std': '/data/share/3D/s3disMask3d/color_mean_std.yaml', 'data_train_dataset_data_percent': 1.0, 'data_train_dataset_mode': 'train', 'data_train_dataset_ignore_label': 255, 'data_train_dataset_num_labels': 13, 'data_train_dataset_add_raw_coordinates': True, 'data_train_dataset_add_colors': True, 'data_train_dataset_add_normals': False, 'data_train_dataset_add_instance': True, 'data_train_dataset_cache_data': False, 'data_train_dataset_instance_oversampling': 0.0, 'data_train_dataset_place_around_existing': False, 'data_train_dataset_point_per_cut': 0, 'data_train_dataset_max_cut_region': 0, 'data_train_dataset_flip_in_center': False, 'data_train_dataset_noise_rate': 0, 'data_train_dataset_resample_points': 0, 'data_train_dataset_cropping': False, 'data_train_dataset_cropping_args_min_points': 30000, 'data_train_dataset_cropping_args_aspect': 0.8, 'data_train_dataset_cropping_args_min_crop': 0.5, 'data_train_dataset_cropping_args_max_crop': 1.0, 'data_train_dataset_is_tta': False, 'data_train_dataset_crop_min_size': 20000, 'data_train_dataset_crop_length': 6.0, 'data_train_dataset_cropping_v1': True, 'data_train_dataset_area': 1, 'data_train_dataset_filter_out_classes': [], 'data_train_dataset_label_offset': 0, 'data_validation_dataset__target_': 'datasets.semseg.SemanticSegmentationDataset', 'data_validation_dataset_dataset_name': 's3dis', 'data_validation_dataset_data_dir': '/data/share/3D/s3disMask3d', 'data_validation_dataset_image_augmentations_path': None, 'data_validation_dataset_volume_augmentations_path': None, 'data_validation_dataset_label_db_filepath': '/data/share/3D/s3disMask3d/label_database.yaml', 'data_validation_dataset_color_mean_std': '/data/share/3D/s3disMask3d/color_mean_std.yaml', 'data_validation_dataset_data_percent': 1.0, 'data_validation_dataset_mode': 'validation', 'data_validation_dataset_ignore_label': 255, 'data_validation_dataset_num_labels': 13, 'data_validation_dataset_add_raw_coordinates': True, 'data_validation_dataset_add_colors': True, 'data_validation_dataset_add_normals': False, 'data_validation_dataset_add_instance': True, 'data_validation_dataset_cache_data': False, 'data_validation_dataset_cropping': False, 'data_validation_dataset_is_tta': False, 'data_validation_dataset_crop_min_size': 20000, 'data_validation_dataset_crop_length': 6.0, 'data_validation_dataset_cropping_v1': True, 'data_validation_dataset_area': 1, 'data_validation_dataset_filter_out_classes': [], 'data_validation_dataset_label_offset': 0, 'data_test_dataset__target_': 'datasets.semseg.SemanticSegmentationDataset', 'data_test_dataset_dataset_name': 's3dis', 'data_test_dataset_data_dir': '/data/share/3D/s3disMask3d', 'data_test_dataset_image_augmentations_path': None, 'data_test_dataset_volume_augmentations_path': None, 'data_test_dataset_label_db_filepath': '/data/share/3D/s3disMask3d/label_database.yaml', 'data_test_dataset_color_mean_std': '/data/share/3D/s3disMask3d/color_mean_std.yaml', 'data_test_dataset_data_percent': 1.0, 'data_test_dataset_mode': 'validation', 'data_test_dataset_ignore_label': 255, 'data_test_dataset_num_labels': 13, 'data_test_dataset_add_raw_coordinates': True, 'data_test_dataset_add_colors': True, 'data_test_dataset_add_normals': False, 'data_test_dataset_add_instance': True, 'data_test_dataset_cache_data': False, 'data_test_dataset_cropping': False, 'data_test_dataset_is_tta': False, 'data_test_dataset_crop_min_size': 20000, 'data_test_dataset_crop_length': 6.0, 'data_test_dataset_cropping_v1': True, 'data_test_dataset_area': 1, 'data_test_dataset_filter_out_classes': [], 'data_test_dataset_label_offset': 0, 'data_train_collation__target_': 'datasets.utils.VoxelizeCollate', 'data_train_collation_ignore_label': 255, 'data_train_collation_voxel_size': 0.02, 'data_train_collation_mode': 'train', 'data_train_collation_small_crops': False, 'data_train_collation_very_small_crops': False, 'data_train_collation_batch_instance': False, 'data_train_collation_probing': False, 'data_train_collation_task': 'instance_segmentation', 'data_train_collation_ignore_class_threshold': 100, 'data_train_collation_filter_out_classes': [], 'data_train_collation_label_offset': 0, 'data_train_collation_num_queries': 100, 'data_validation_collation__target_': 'datasets.utils.VoxelizeCollate', 'data_validation_collation_ignore_label': 255, 'data_validation_collation_voxel_size': 0.02, 'data_validation_collation_mode': 'validation', 'data_validation_collation_batch_instance': False, 'data_validation_collation_probing': False, 'data_validation_collation_task': 'instance_segmentation', 'data_validation_collation_ignore_class_threshold': 100, 'data_validation_collation_filter_out_classes': [], 'data_validation_collation_label_offset': 0, 'data_validation_collation_num_queries': 100, 'data_test_collation__target_': 'datasets.utils.VoxelizeCollate', 'data_test_collation_ignore_label': 255, 'data_test_collation_voxel_size': 0.02, 'data_test_collation_mode': 'validation', 'data_test_collation_batch_instance': False, 'data_test_collation_probing': False, 'data_test_collation_task': 'instance_segmentation', 'data_test_collation_ignore_class_threshold': 100, 'data_test_collation_filter_out_classes': [], 'data_test_collation_label_offset': 0, 'data_test_collation_num_queries': 100, 'logging': [{'_target_': 'pytorch_lightning.loggers.WandbLogger', 'project': 's3dis', 'name': 'area1_from_scratch', 'save_dir': 'saved/area1_from_scratch', 'entity': 'schult', 'resume': 'allow', 'id': 'area1_from_scratch'}], 'model__target_': 'models.Mask3D', 'model_hidden_dim': 128, 'model_dim_feedforward': 1024, 'model_num_queries': 100, 'model_num_heads': 8, 'model_num_decoders': 3, 'model_dropout': 0.0, 'model_pre_norm': False, 'model_use_level_embed': False, 'model_normalize_pos_enc': True, 'model_positional_encoding_type': 'fourier', 'model_gauss_scale': 1.0, 'model_hlevels': [0, 1, 2, 3], 'model_non_parametric_queries': True, 'model_random_query_both': False, 'model_random_normal': False, 'model_random_queries': False, 'model_use_np_features': False, 'model_sample_sizes': [200, 800, 3200, 12800, 51200], 'model_max_sample_size': False, 'model_shared_decoder': True, 'model_num_classes': 14, 'model_train_on_segments': False, 'model_scatter_type': 'mean', 'model_voxel_size': 0.02, 'model_config_backbone__target_': 'models.Res16UNet34C', 'model_config_backbone_config_dialations': [1, 1, 1, 1], 'model_config_backbone_config_conv1_kernel_size': 5, 'model_config_backbone_config_bn_momentum': 0.02, 'model_config_backbone_in_channels': 3, 'model_config_backbone_out_channels': 13, 'model_config_backbone_out_fpn': True, 'metrics__target_': 'models.metrics.ConfusionMatrix', 'metrics_num_classes': 13, 'metrics_ignore_label': 255, 'optimizer__target_': 'torch.optim.AdamW', 'optimizer_lr': 0.0001, 'scheduler_scheduler__target_': 'torch.optim.lr_scheduler.OneCycleLR', 'scheduler_scheduler_max_lr': 0.0001, 'scheduler_scheduler_epochs': 1001, 'scheduler_scheduler_steps_per_epoch': -1, 'scheduler_pytorch_lightning_params_interval': 'step', 'trainer_deterministic': False, 'trainer_max_epochs': 1001, 'trainer_min_epochs': 1, 'trainer_resume_from_checkpoint': None, 'trainer_check_val_every_n_epoch': 10, 'trainer_num_sanity_val_steps': 2, 'callbacks': [{'_target_': 'pytorch_lightning.callbacks.ModelCheckpoint', 'monitor': 'val_mean_ap_50', 'save_last': True, 'save_top_k': 1, 'mode': 'max', 'dirpath': 'saved/area1_from_scratch', 'filename': '{epoch}-{val_mean_ap_50:.3f}', 'every_n_epochs': 1}, {'_target_': 'pytorch_lightning.callbacks.LearningRateMonitor'}], 'matcher__target_': 'models.matcher.HungarianMatcher', 'matcher_cost_class': 2.0, 'matcher_cost_mask': 5.0, 'matcher_cost_dice': 2.0, 'matcher_num_points': -1, 'loss__target_': 'models.criterion.SetCriterion', 'loss_num_classes': 14, 'loss_eos_coef': 0.1, 'loss_losses': ['labels', 'masks'], 'loss_num_points': -1, 'loss_oversample_ratio': 3.0, 'loss_importance_sample_ratio': 0.75, 'loss_class_weights': -1} /home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: SettingTrainer(gpus=1)is deprecated in v1.7 and will be removed in v2.0. Please useTrainer(accelerator='gpu', devices=1)instead. rank_zero_deprecation( /home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:57: LightningDeprecationWarning: SettingTrainer(weights_save_path=)has been deprecated in v1.6 and will be removed in v1.8. Please pass ``dirpath`` directly to theModelCheckpointcallback rank_zero_deprecation( GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /home/jerry/3D/3d_seg/mask3d/Mask3D/datasets/semseg.py:573: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. file = yaml.load(f) /home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:616: UserWarning: Checkpoint directory /home/jerry/3D/3d_seg/mask3d/Mask3D/saved/area1_from_scratch exists and is not empty. rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.") LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] Traceback (most recent call last): File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/_internal/utils.py", line 198, in run_and_report return func() File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/_internal/utils.py", line 347, in <lambda> lambda: hydra.run( File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 107, in run return run_job( File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/core/utils.py", line 128, in run_job ret.return_value = task_function(task_cfg) File "/home/jerry/3D/3d_seg/mask3d/Mask3D/main_instance_segmentation.py", line 98, in main train(cfg) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/main.py", line 27, in decorated_main return task_function(cfg_passthrough) File "/home/jerry/3D/3d_seg/mask3d/Mask3D/main_instance_segmentation.py", line 78, in train runner.fit(model) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit self._call_and_handle_interrupt( File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1149, in _run self.strategy.setup(self) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/strategies/single_device.py", line 74, in setup super().setup(trainer) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 153, in setup self.setup_optimizers(trainer) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 141, in setup_optimizers self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers( File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 194, in _init_optimizers_and_lr_schedulers _validate_scheduler_api(lr_scheduler_configs, model) File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 351, in _validate_scheduler_api raise MisconfigurationException( pytorch_lightning.utilities.exceptions.MisconfigurationException: The provided lr schedulerOneCycleLRdoesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step` hook with your own logic if you are using a custom LR scheduler.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/jerry/3D/3d_seg/mask3d/Mask3D/main_instance_segmentation.py", line 104, in main() File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/main.py", line 32, in decorated_main _run_hydra( File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra run_and_report( File "/home/jerry/miniconda3/envs/mask3d/lib/python3.10/site-packages/hydra/_internal/utils.py", line 267, in run_and_report print_exception(etype=None, value=ex, tb=final_tb) # type: ignore TypeError: print_exception() got an unexpected keyword argument 'etype' wandb: Waiting for W&B process to finish... (failed 1). wandb: You can sync this run to the cloud by running: wandb: wandb sync saved/area1_from_scratch/wandb/offline-run-20230516_184719-area1_from_scratch wandb: Find logs at: saved/area1_from_scratch/wandb/offline-run-20230516_184719-area1_from_scratch/logs `

The pytorch-lightning version I have is 1.7.2. Would appreciate your help!

mst136 commented 1 year ago

I also encountered this problem. Have you solve it?thank you for you rely !

jerry3chen commented 1 year ago

I think I got it figured out. It is the problem with pytorch-lightning. The authors did not specify the version they used, and the newest pytorch-lightning has tons of changes. I eventually was able to get the program running by using pytorch-lightning==1.8.5, also I had to change the line "weights_save_path=str(cfg.general.save_dir)," in file "Mask3D/main_instance_segmentation.py" to "default_root_dir=str(cfg.general.save_dir),".

It would be great if the authors can come on and tell the version they used. Regards

mst136 commented 1 year ago

Thanks for you answer ! My problem has also been resolved.

JonasSchult commented 1 year ago

Hi! :)

Thanks for your interest in our work!

we have just released a fix which most likely will solve your issue. Just pull the latest version of the project and recreate your virtual python environment following the instructions here.

Previously, the issue occurred because we did not lock the packages in the Python environment to specific versions. We now made sure that the packages in the environment remain the same.

I will close the issue now. Please feel free to reopen it again if you continue to experience issues.

Best, Jonas

Lizhinwafu commented 1 year ago

Thanks for you answer ! My problem has also been resolved.

How to train? What are the specific commands?