Resolve conflicting key names across bundles

SachidanandAlle commented 2 years ago

Conflict key names: https://github.com/Project-MONAI/model-zoo/blob/dev/models/spleen_deepedit_annotation/configs/inference.json#L63

vs

https://github.com/Project-MONAI/model-zoo/blob/dev/models/spleen_ct_segmentation/configs/inference.json#L33

Is there a validation script to check against all configs.. or user can choose any name in such cases?

yiheng-wang-nv commented 2 years ago

Hi @SachidanandAlle , yes, like a manually written training/inference pipeline, the name of transforms is just a variable, and users can choose any name.

SachidanandAlle commented 2 years ago

i thought we are defining/following a strict schema here.. for the basic/reserved keys in the json/yaml..

Nic-Ma commented 2 years ago

Thanks for the great discussion last week, I think we almost aligned to define the "standard" of the file and key names in a bundle for better integration with other apps like: MONAI Label, MONAI deploy, NVFlare, etc.

So here for the first step, I think let's gather the requirements for keys:

MONAI Label: https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/monaibundle/lib/infers/bundle.py#L27-L37 https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/monaibundle/lib/trainers/bundle.py#L29-L41 file names: _"configs/multi_gpu_train.json" or "configs/multi_gputrain.yaml" "configs/inference.json" or "configs/inference.yaml (yml)" "configs/metadata.json" "models/model.pt" "models/model.ts" key names: _"device" "bundle_root" "network_def" "network"(final GPU net in use) "preprocessing" "postprocessing" "inferer" "train#trainer#maxepochs" "train#dataset#data" "train#handlers" "validate#dataset#data" @SachidanandAlle And for the above keys, which one can be optional or missing?
MONAI deploy: https://github.com/Project-MONAI/monai-deploy-app-sdk/blob/0a8af47f20691938a6dd1f2288c8c96b9fef0dab/monai/deploy/operators/monai_bundle_inference_operator.py#L148-L157 file names: "configs/inference.json" or "configs/inference.yaml (yml)" key names: "preprocessing", "postprocessing", "inferer" @MMelQin can the "postprocessing" be optional?
NVFlare: @yiheng-wang-nv @holgerroth Need you two to help provide the expected files & keys later when MONAI-FL + NVFlare is ready.

@yiheng-wang-nv For the next steps:

Update the contribution guide of model-zoo to introduce the specification for "standard" files and keys.
Update all the existing bundles to follow it.
Add CI tests to verify the files and keys with schema.

What do you guys think?

Thanks.

wyli commented 2 years ago

Probably we should also clarify the differences between this and the bundle specifications. https://github.com/Project-MONAI/MONAI/blob/dev/docs/source/mb_specification.rst

Nic-Ma commented 2 years ago

@wyli , yes, thanks for the reminder. This is the requirement only for model-zoo repo for a "model" in the zoo, not the bundle technical feature.

Thanks.

SachidanandAlle commented 2 years ago

Train jsonnkeys are also mentioned in the readme along with inference (next line) https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/monaibundle/lib/trainers/bundle.py

yiheng-wang-nv commented 2 years ago

Hi @Nic-Ma , sure, I will start for this updates soon.

Nic-Ma commented 2 years ago

Hi @SachidanandAlle ,

I have updated the description with your train part, could you please help confirm which keys can be missing or optional? @MMelQin @ericspod Also need your help to confirm the deploy part.

Thanks in advance.

MMelQin commented 2 years ago

Hi @Nic-Ma I think I first need to make clear and emphasize a few points since I had to drop before the conclusion of the last meeting.

The standardized key names for pre/post processing, inferer, and config file can be any string that we agree upon. In Deploy App SDK, I had already considered the possible variation/customization, and have a mapping class to customize the literal strings to the logical preproc_name, postproc_name, inferer_name, and config_names. Having said that, it is better to keep using the names that are already in use, e.g. preprocessing, postprocessing, inferer, and inference.json so that I don't need to go and update the defaults in the SDK.
With regard to multiple model networks in a MONAI Bundle, I don't think it need to be supported. As we defined, a bundle is for a single model network, with metadata and processing algorithms, encapsulated in a single TorchScript file, and for a specific taks, e.g. inference, there is at most a single preprocessing attribute and so on. Making use of multiple models is an application concern, e.g. Deploy App SDK has always considered making use of multiple models to complete a single inference task, e.g. a COVID-19 classification app actually uses an ensemble of lung seg, ground glass seg, and then classification models, with inference result from model(s) feeding the next one. A technical demonstration example is also available in MONAI Deploy App SDK. So, let's stick with this, and if a trainer needs to publish multiple models for a specific use case, let's think about how best to come up with a definition of ensemble of models. It is also noted that popular inference servers, e.g. Nvidia Triton Inference Server, consider each model as a single network (TorchScript) while supporting ensemble if multiple models are needed in a inference request execution.
As for the multiplicity of pre and post processing attributes, they are 0..1. If 0 or none, then the parser will return a Compose with zero elements, basically no-op. It is already handled in the Deploy App SDK Bundle Inference operator.

holgerroth commented 2 years ago

For MONAI-FL, we should consider including

    KEY_TRAINER = "train#trainer"
    KEY_EVALUATOR = "validate#evaluator"

holgerroth commented 2 years ago

Is there any way we can also standardize the validation interval? Currently, one would have to do something like the below assuming the ValidationHandler is the first in the list.

        # the 0-th handler is ValidationHandler, which controls the interval.
        parser["train#handlers"][0]["interval"] = val_interval

SachidanandAlle commented 2 years ago

Possibly by defining the variable..

holgerroth commented 2 years ago

Another key that's needed is "key_metric"

holgerroth commented 2 years ago

Possibly by defining the variable..

Yes, could be defined at the root level, e.g. val_interval.

Nic-Ma commented 2 years ago

I see @holgerroth already added several required bundle keys in: https://github.com/Project-MONAI/MONAI/blob/dev/monai/fl/utils/constants.py#L38 We will add CI test in the model-zoo repo to ensure all the models must have the expected keys.

Thanks.

yiheng-wang-nv commented 2 years ago

ValidationHandler

Hi @holgerroth , key_metric is not an argument of monai engine, thus none of the bundle has this key name.

holgerroth commented 2 years ago

I was referring to this section of the bundle.

        "key_metric": {
            "val_mean_dice": {
                "_target_": "MeanDice",
                "include_background": false,
                "output_transform": "$monai.handlers.from_engine(['pred', 'label'])"
            }
        },

yiheng-wang-nv commented 2 years ago

I was referring to this section of the bundle.

        "key_metric": {
            "val_mean_dice": {
                "_target_": "MeanDice",
                "include_background": false,
                "output_transform": "$monai.handlers.from_engine(['pred', 'label'])"
            }
        },

I see, that's "train#key_metric", thanks @holgerroth

holgerroth commented 2 years ago

Yes, and "validate#key_metric"

Nic-Ma commented 2 years ago

Mark: @yiheng-wang-nv is working on a PR in the model-zoo repo to enable keywords check for all the typical supervised classification and segmentation models: https://github.com/Project-MONAI/model-zoo/pull/116

Thanks.

yiheng-wang-nv commented 2 years ago

Mark: @yiheng-wang-nv is working on a PR in the model-zoo repo to enable keywords check for all the typical supervised classification and segmentation models: #116

Thanks.

Now the PR has been finished and wait for review, what I changed include "wrongly" used names, and related CI tests. However, I think most of the requirements above mentioned are optional requirements, and we can only require users to use suitable names if the corresponding components are used. For example:

the following bundles do not have "train" and "validate" sections:

mednist_gan
valve_landmarks
ventricular_short_axis_3label

Therefore, for these bundles, the CI tests will skip checking components like "train#trainer". If we must have "train" and/or "validate" in "train.json", here may need @ericspod 's help to modify all of your bundles.

the following bundles do not have train config file (only has "inference.json"):
```
renalStructures_UNEST_segmentation
wholeBrainSeg_Large_UNEST_segmentation
```
Therefore, these bundles may not be used in MONAI FL.
the following bundles do not use postprocessing:
```
lung_nodule_ct_detection
```
Therefore, this bundle will skip checking if "postprocessing" is correctly used.

Hi @wyli @Nic-Ma @holgerroth @SachidanandAlle what do you think?

Nic-Ma commented 2 years ago

Hi @yiheng-wang-nv ,

Thanks for the summary. As I mentioned in the previous comment, I think you only need to verify keywords for "typical supervised classification and segmentation models", you can maintain the list of bundle to run tests, right?

Thanks.

Nic-Ma commented 1 year ago

Hi @wyli @binliunls @yiheng-wang-nv ,

I would suggest to add the "output_dir" as a required key for bundles in the CI. We are using it for experiment management, for example: https://github.com/Project-MONAI/MONAI/blob/dev/monai/bundle/utils.py#L107 What do you think? If you don't have concerns, @yiheng-wang-nv could you please help add it?

Thanks in advance.

daguangxu commented 1 year ago

in training config requirement, validate#handlers and dataset_dir are missing. Here's what FL is using https://github.com/Project-MONAI/MONAI/blob/d5344a573ca0eb78cc6617da300816f8ad031087/monai/fl/utils/constants.py#L56

Note, dataset_dir is only used by FL stats. Maybe there can be some consolidation with Auto3DSeg as to how the data location is defined.

yiheng-wang-nv commented 1 year ago

According to @Nic-Ma and @daguangxu 's comments, the following bundles need to be changed:

brats_mri_segmentation, need to modify "data_file_base_dir" to "dataset_dir"
lung_nodule_ct_detection, need to modify "data_file_base_dir" to "dataset_dir"
pathology_tumor_detection, need to modify data_root to "dataset_dir"
valve_landmarks, need to add "dataset_dir"
ventricular_short_axis_3label, need to add "dataset_dir"

However, the following two bundles missing output_dir:

endoscopic_inbody_classification, missing output_dir in inference.json
mednist_gan, missing output_dir in train.json

Hi @Nic-Ma , we may need to confirm if this key is necessary. If not having any output files, maybe we are not able to make it a hard requirement, and we can just advice users to use this variable if needed.

yiheng-wang-nv commented 1 year ago

According to @Nic-Ma and @daguangxu 's comments, the following bundles need to be changed:

brats_mri_segmentation, need to modify "data_file_base_dir" to "dataset_dir"

lung_nodule_ct_detection, need to modify "data_file_base_dir" to "dataset_dir"

pathology_tumor_detection, need to modify data_root to "dataset_dir"

valve_landmarks, need to add "dataset_dir"

ventricular_short_axis_3label, need to add "dataset_dir"

However, the following two bundles missing output_dir:

endoscopic_inbody_classification, missing output_dir in inference.json

mednist_gan, missing output_dir in train.json

Hi @Nic-Ma , we may need to confirm if this key is necessary. If not having any output files, maybe we are not able to make it a hard requirement, and we can just advice users to use this variable if needed.

I just aligned with Nic, and we planned to remove the hard requirement of having output_dir since bundles may not need to output files in training or inference. Within the PR.

Project-MONAI / model-zoo

Resolve conflicting key names across bundles #52