FeTS-AI / Challenge

The repo for the FeTS Challenge
https://www.synapse.org/#!Synapse:syn28546456
47 stars 29 forks source link

ignore_label_validation error #178

Open Gresliebear opened 2 years ago

Gresliebear commented 2 years ago
(venv) PS C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1> python .\FeTS_Challenge.py
Creating Workspace Directories
Creating Workspace Templates
Requirement already satisfied: torchvision in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from -r C:\Users\15702\.local\workspace/requirements.txt (line 1)) (0.9.2+cu111)
Requirement already satisfied: torch in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from -r C:\Users\15702\.local\workspace/requirements.txt (line 2)) (1.8.2+cu111)
Requirement already satisfied: numpy in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from torchvision->-r C:\Users\15702\.local\workspace/requirements.txt (line 1)) (1.21.0)
Requirement already satisfied: pillow>=4.1.1 in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from torchvision->-r C:\Users\15702\.local\workspace/requirements.txt (line 1)) (9.1.1)
Requirement already satisfied: typing-extensions in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from torch->-r C:\Users\15702\.local\workspace/requirements.txt (line 2)) (4.2.0)
Successfully installed packages from C:\Users\15702\.local\workspace/requirements.txt.

New workspace directory structure:
workspace
├── .workspace
├── agg_to_col_one_signed_cert.zip
├── agg_to_col_two_signed_cert.zip
├── cert
├── checkpoint
├── data
├── gandlf_paths.csv
├── logs
├── output_validation
│   └── 0
├── partitioning_1.csv
├── partitioning_2.csv
├── plan
│   ├── cols.yaml
│   ├── data.yaml
│   ├── defaults
│   └── plan.yaml
├── raid
│   └── datasets
│       └── FeTS22
├── requirements.txt
├── save
│   └── fets_seg_test_init.pbuf
├── seg_test_train.csv
├── seg_test_val.csv
├── small_split.csv
├── src
│   ├── challenge_assigner.py
│   ├── fets_challenge_model.py
│   ├── __init__.py
│   └── __pycache__
│       ├── challenge_assigner.cpython-37.pyc
│       ├── fets_challenge_model.cpython-37.pyc
│       └── __init__.cpython-37.pyc
└── validation.csv

13 directories, 22 files
Setting Up Certificate Authority...

1.  Create Root CA
1.1 Create Directories
1.2 Create Database
1.3 Create CA Request and Certificate
2.  Create Signing Certificate
2.1 Create Directories
2.2 Create Database
2.3 Create Signing Certificate CSR
2.4 Sign Signing Certificate CSR
3   Create Certificate Chain

Done.
Creating AGGREGATOR certificate key pair with following settings: CN=openvessel.ptd.net, SAN=DNS:openvessel.ptd.net
  Writing AGGREGATOR certificate key pair to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/server
The CSR Hash for file server/agg_openvessel.ptd.net.csr = f713b37863866bd5a82473efd30b8e494ef0243b4470fae2ae40e7d75f5415475f38c91986391d95436bce024df14bf1
 Signing AGGREGATOR certificate
Creating COLLABORATOR certificate key pair with following settings: CN=one, SAN=DNS:one
  Moving COLLABORATOR certificate to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/col_one
The CSR Hash for file col_one.csr = 58fdc5a503366177f1556335d22295b6d598078341ad3b40ad7301c2cf3dac5252d8feea1f03bb7fa6077b2541562860
 Signing COLLABORATOR certificate

Registering odeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert\client\col_one in C:\Users\15702\.local\workspace\plan\cols.yaml
Creating COLLABORATOR certificate key pair with following settings: CN=two, SAN=DNS:two
  Moving COLLABORATOR certificate to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/col_two
The CSR Hash for file col_two.csr = 374efb23a8b7af15d53eb824db7136e5996b418c38e9b65a12384788aff27fb0c5d59de2418784030bc3196d4342cf27
 Signing COLLABORATOR certificate

Registering odeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert\client\col_two in C:\Users\15702\.local\workspace\plan\cols.yaml
C:\Users\15702\.local\workspace\gandlf_paths.csv
No 'TrainOrVal' column found in split_subdirs csv, so performing automated split using percent_train of 0.8
[]
[]
[]
[20:09:15] INFO     Updating aggregator.settings.rounds_to_train to 5...                                                                           native.py:102
           INFO     Updating aggregator.settings.db_store_rounds to 5...                                                                           native.py:102
           WARNING  Did not find tasks.train.aggregation_type in config. Make sure it should exist. Creating...                                    native.py:105
           INFO     Updating task_runner.settings.device to cpu...                                                                                 native.py:102
           WARNING  Did not find task_runner.settings.fets_config_dict.data_preprocessing in config. Make sure it should exist. Creating...        native.py:105
           WARNING  Did not find task_runner.settings.fets_config_dict.ignore_label_validation in config. Make sure it should exist. Creating...   native.py:105
           INFO     FL-Plan hash is 601cd0b67629af4d8ea0527f65b8a6613cc7d60f28d1a035e5167db87264c20e2fc1f2844d0df0c45d72ae1b29dcff48                 plan.py:234
{
    "aggregator.settings.best_state_path": "save/fets_seg_test_best.pbuf",
    "aggregator.settings.db_store_rounds": 2,
    "aggregator.settings.init_state_path": "save/fets_seg_test_init.pbuf",
    "aggregator.settings.last_state_path": "save/fets_seg_test_last.pbuf",
    "aggregator.settings.rounds_to_train": 3,
    "aggregator.settings.write_logs": true,
    "aggregator.template": "openfl.component.Aggregator",
    "assigner.settings.training_tasks.0": "aggregated_model_validation",
    "assigner.settings.training_tasks.1": "train",
    "assigner.settings.training_tasks.2": "locally_tuned_model_validation",
    "assigner.settings.validation_tasks.0": "aggregated_model_validation",
    "assigner.template": "src.challenge_assigner.FeTSChallengeAssigner",
    "collaborator.settings.db_store_rounds": 1,
    "collaborator.settings.delta_updates": false,
    "collaborator.settings.opt_treatment": "RESET",
    "collaborator.template": "openfl.component.Collaborator",
    "compression_pipeline.settings": {},
    "compression_pipeline.template": "openfl.pipelines.NoCompressionPipeline",
    "data_loader.settings.feature_shape.0": 32,
    "data_loader.settings.feature_shape.1": 32,
    "data_loader.settings.feature_shape.2": 32,
    "data_loader.template": "openfl.federated.data.loader_fets_challenge.FeTSChallengeDataLoaderWrapper",
    "network.settings.agg_addr": "openvessel.ptd.net",
    "network.settings.agg_port": 54937,
    "network.settings.cert_folder": "cert",
    "network.settings.client_reconnect_interval": 5,
    "network.settings.disable_client_auth": false,
    "network.settings.hash_salt": "auto",
    "network.settings.tls": true,
    "network.template": "openfl.federation.Network",
    "task_runner.settings.device": "cpu",
    "task_runner.settings.fets_config_dict.batch_size": 1,
    "task_runner.settings.fets_config_dict.data_augmentation": {},
    "task_runner.settings.fets_config_dict.data_postprocessing": {},
    "task_runner.settings.fets_config_dict.enable_padding": false,
    "task_runner.settings.fets_config_dict.in_memory": true,
    "task_runner.settings.fets_config_dict.inference_mechanism.grid_aggregator_overlap": "crop",
    "task_runner.settings.fets_config_dict.inference_mechanism.patch_overlap": 0,
    "task_runner.settings.fets_config_dict.learning_rate": 0.001,
    "task_runner.settings.fets_config_dict.loss_function": "dc",
    "task_runner.settings.fets_config_dict.medcam_enabled": false,
    "task_runner.settings.fets_config_dict.metrics.0": "dice",
    "task_runner.settings.fets_config_dict.metrics.1": "dice_per_label",
    "task_runner.settings.fets_config_dict.metrics.2": "hd95_per_label",
    "task_runner.settings.fets_config_dict.model.amp": true,
    "task_runner.settings.fets_config_dict.model.architecture": "resunet",
    "task_runner.settings.fets_config_dict.model.base_filters": 32,
    "task_runner.settings.fets_config_dict.model.class_list.0": 0,
    "task_runner.settings.fets_config_dict.model.class_list.1": 1,
    "task_runner.settings.fets_config_dict.model.class_list.2": 2,
    "task_runner.settings.fets_config_dict.model.class_list.3": 4,
    "task_runner.settings.fets_config_dict.model.dimension": 3,
    "task_runner.settings.fets_config_dict.model.final_layer": "softmax",
    "task_runner.settings.fets_config_dict.model.norm_type": "instance",
    "task_runner.settings.fets_config_dict.nested_training.testing": 1,
    "task_runner.settings.fets_config_dict.nested_training.validation": -5,
    "task_runner.settings.fets_config_dict.num_epochs": 1,
    "task_runner.settings.fets_config_dict.optimizer.type": "sgd",
    "task_runner.settings.fets_config_dict.output_dir": ".",
    "task_runner.settings.fets_config_dict.parallel_compute_command": "",
    "task_runner.settings.fets_config_dict.patch_sampler": "label",
    "task_runner.settings.fets_config_dict.patch_size.0": 64,
    "task_runner.settings.fets_config_dict.patch_size.1": 64,
    "task_runner.settings.fets_config_dict.patch_size.2": 64,
    "task_runner.settings.fets_config_dict.patience": 100,
    "task_runner.settings.fets_config_dict.pin_memory_dataloader": false,
    "task_runner.settings.fets_config_dict.print_rgb_label_warning": true,
    "task_runner.settings.fets_config_dict.q_max_length": 100,
    "task_runner.settings.fets_config_dict.q_num_workers": 0,
    "task_runner.settings.fets_config_dict.q_samples_per_volume": 40,
    "task_runner.settings.fets_config_dict.q_verbose": false,
    "task_runner.settings.fets_config_dict.save_output": false,
    "task_runner.settings.fets_config_dict.save_training": false,
    "task_runner.settings.fets_config_dict.scaling_factor": 1,
    "task_runner.settings.fets_config_dict.scheduler.type": "triangle_modified",
    "task_runner.settings.fets_config_dict.track_memory_usage": false,
    "task_runner.settings.fets_config_dict.verbose": false,
    "task_runner.settings.fets_config_dict.version.maximum": "0.0.14",
    "task_runner.settings.fets_config_dict.version.minimum": "0.0.14",
    "task_runner.settings.fets_config_dict.weighted_loss": true,
    "task_runner.settings.train_csv": "seg_test_train.csv",
    "task_runner.settings.val_csv": "seg_test_val.csv",
    "task_runner.template": "src.fets_challenge_model.FeTSChallengeModel",
    "tasks.aggregated_model_validation.function": "validate",
    "tasks.aggregated_model_validation.kwargs.apply": "global",
    "tasks.aggregated_model_validation.kwargs.metrics.0": "valid_loss",
    "tasks.aggregated_model_validation.kwargs.metrics.1": "valid_dice",
    "tasks.locally_tuned_model_validation.function": "validate",
    "tasks.locally_tuned_model_validation.kwargs.apply": "local",
    "tasks.locally_tuned_model_validation.kwargs.metrics.0": "valid_loss",
    "tasks.locally_tuned_model_validation.kwargs.metrics.1": "valid_dice",
    "tasks.settings": {},
    "tasks.train.function": "train",
    "tasks.train.kwargs.epochs": 1,
    "tasks.train.kwargs.metrics.0": "loss",
    "tasks.train.kwargs.metrics.1": "train_dice"
}
           INFO     Building 🡆  Object FeTSChallengeDataLoaderWrapper from openfl.federated.data.loader_fets_challenge Module.                        plan.py:173
           INFO     Building 🡆  Object FeTSChallengeDataLoaderWrapper from openfl.federated.data.loader_fets_challenge Module.                        plan.py:173
           INFO     Building 🡆  Object FeTSChallengeDataLoaderWrapper from openfl.federated.data.loader_fets_challenge Module.                        plan.py:173
           INFO     Building 🡆  Object FeTSChallengeModel from src.fets_challenge_model Module.                                                       plan.py:173
Constructing queue for train data: 100%|████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.10it/s]
Calculating weights
Constructing queue for penalty data: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.26it/s] 
Looping over training data for penalty calculation: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.77it/s] 
Constructing queue for validation data: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.38it/s]
All Keys :  ['subject_id', '2', 'spacing', '3', '4', '5', 'label', 'path_to_metadata']
Since Device is CPU, Mixed Precision Training is set to False
[20:09:22] INFO     Building 🡆  Object FeTSChallengeModel from src.fets_challenge_model Module.                                                       plan.py:173
Constructing queue for train data: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.71it/s] 
Calculating weights
Constructing queue for penalty data: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.66it/s] 
Looping over training data for penalty calculation: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.01it/s] 
Constructing queue for validation data: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.51it/s]
All Keys :  ['subject_id', '2', 'spacing', '3', '4', '5', 'label', 'path_to_metadata']
Since Device is CPU, Mixed Precision Training is set to False
[20:09:25] INFO     Building 🡆  Object FeTSChallengeModel from src.fets_challenge_model Module.                                                       plan.py:173
Constructing queue for train data: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.65it/s] 
Calculating weights
Constructing queue for penalty data: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.86it/s] 
Looping over training data for penalty calculation: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.10it/s] 
Constructing queue for validation data: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.80it/s]
All Keys :  ['subject_id', '2', 'spacing', '3', '4', '5', 'label', 'path_to_metadata']
Since Device is CPU, Mixed Precision Training is set to False
Loading pretrained model...
[20:09:28] INFO     Building 🡆  Object NoCompressionPipeline from openfl.pipelines Module.                                                            plan.py:173
[20:09:29] INFO     Creating aggregator...                                                                                                     experiment.py:323
           INFO     Building 🡆  Object FeTSChallengeAssigner from src.challenge_assigner Module.                                                      plan.py:173
           INFO     Building 🡆  Object Aggregator from openfl.component Module.                                                                       plan.py:173
           INFO     Creating collaborators...                                                                                                  experiment.py:330
           INFO     Building 🡆  Object Collaborator from openfl.component Module.                                                                     plan.py:173
           INFO     Building 🡆  Object Collaborator from openfl.component Module.                                                                     plan.py:173
           INFO     Building 🡆  Object Collaborator from openfl.component Module.                                                                     plan.py:173
           INFO     Starting experiment                                                                                                        experiment.py:338
           INFO                                                                                                                                experiment.py:366
                    Created experiment folder experiment_1...                                                                                          

           INFO     Collaborators chosen to train for round 0:                                                                                 experiment.py:403
                            ['1', '2', '3']                                                                                                            

           INFO     Hyper-parameters for round 0:                                                                                              experiment.py:425
                            learning rate: 5e-05                                                                                                       

                            epochs_per_round: 1                                                                                                        

           INFO     Waiting for tasks...                                                                                                     collaborator.py:178
           INFO     Sending tasks to collaborator 3 for round 0                                                                                aggregator.py:312
           INFO     Received the following tasks: ['aggregated_model_validation', 'train', 'locally_tuned_model_validation']                 collaborator.py:168
[20:09:30] INFO     Using TaskRunner subclassing API                                                                                         collaborator.py:253
********************
Starting validation :
********************
Looping over validation data:   0%|                                                                                             | 0/1 [00:02<?, ?it/s] 
Traceback (most recent call last):
  File ".\FeTS_Challenge.py", line 584, in <module>
    restore_from_checkpoint_folder = restore_from_checkpoint_folder)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\experiment.py", line 468, in run_challenge_experiment
    collaborators[col].run_simulation()
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\openfl\component\collaborator\collaborator.py", line 170, in run_simulation
    self.do_task(task, round_number)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\openfl\component\collaborator\collaborator.py", line 259, in do_task 
    **kwargs)
  File "C:\Users\15702\.local\workspace\src\fets_challenge_model.py", line 48, in validate
    mode="validation")
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\forward_pass.py", line 276, in validate_network       
    result = step(model, image, label, params, train=True)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\step.py", line 88, in step
    loss, metric_output = get_loss_and_metrics(image, label, output, params)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\loss_and_metric.py", line 141, in get_loss_and_metrics
    metric_function, predicted, ground_truth, params
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\loss_and_metric.py", line 13, in get_metric_output    
    metric_output = metric_function(predicted, ground_truth, params).detach().cpu()
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\metrics\segmentation.py", line 42, in multi_class_dice        
    if i != params["model"]["ignore_label_validation"]:
KeyError: 'ignore_label_validation'

solution override the plan.yaml as shown below set to false

overrides = {
    'aggregator.settings.rounds_to_train': rounds_to_train,
    'aggregator.settings.db_store_rounds': db_store_rounds,
    'tasks.train.aggregation_type': aggregation_wrapper,
    'task_runner.settings.device': device,
    'task_runner.settings.fets_config_dict.data_preprocessing': {},
    'task_runner.settings.fets_config_dict.model.ignore_label_validation': False
}