Closed Nic-Ma closed 2 years ago
@cpbridge this is the current PR to expand model metadata.
Hi @wyli @ericspod @rijobro @atbenmurray ,
I updated the ticket description to make it more clear.
Thanks.
We've discussed that the MVP should focus on the minimal structure, including model weights, basic environment/system info, maintaining changelog/versioning, how to ensure accessibility for both human and machine.
Hi @wyli ,
Sure, I added these items to the MVP task list.
Thanks.
@MMelQin @gigony @GreySeaWolf @cpbridge @mocsharp @joshliberty @vikashg thoughts on proposed MVP?
We should discuss a clear definition of requirements and objectives. We want to define a format of a single file or multiples files which contains the model weights at least with secondary information describing how to use it for various use cases. This will allow a human or a program to determine what sort of model it is, how to use the model, and what tasks to use it for. For our MVP we want to consider the starting position of what the model weight storage and metadata storage would look like and if it would achieve that objective to some degree.
The base level set of requirements I would suggest are:
One use case for this information is a human user looking into how the model is used in a particular task. They would want a clear idea of what inputs are expected and what the outputs mean. Whatever format this information is can be either easily read by a human or easily converted into a convenient format using included tools.
A second use case is a deployment environment which automatically constructs whatever infrastructure is needed around a model to present it through a standard interface. This would require generating transform sequences automatically to pre- and post-process data, and load the model through some script defining the workflow. This would used by MONAI Deploy to automatically generate a MAP from the package, or another script automatically create a Docker image to serve the model as a command line tool or through Flask, or another script to interface with existing hosting services to upload the model and whatever other information is needed.
Hi @ericspod ,
Thanks very much for your detailed description. I will try to prepare a draft model package for discussion according to your summary and our basic ideas, and then develop the necessary features to support it.
Thanks.
I'd strongly recommend that the MVP should support the essential features for fast prototyping of model packages. The following uses the popular detectron2
package as an example:
Hybrid configs with python + yaml/json/xml Here is a workflow config of mixing .py and .yaml, and the essential mechanism is a lazy instantiation of objects.
Compositional Configuration
An extension of RCNN-FPN
could be constructed by referring to the base config and overriding some of the options.
These are mainly to ensure a good flexibility and extensibility of the packaging.
Hi @wyli ,
Thanks for your great suggestions. These are very good references for config parsing logic in steps 3 and 4 of this ticket, we definitely need similar features for flexibility and extensibility.
Thanks.
Hi @wyli @ericspod @atbenmurray @rijobro ,
I tried to prepare a draft model package structure for the first step of this ticket in the tutorial PR.
https://github.com/Project-MONAI/tutorials/pull/487
Could you please help take a look and share some feedback for the model package structure?
(1) I didn't implement any config-parsing related logic so far, I think that's later steps.
(2) I am not very sure to put some description strings in metadata.json
or README.md
, so I put in both sides, would like to see your ideas.
Thanks in advance.
To recap where we are with existing issues/PRs:
Related issues:
Hi @ericspod ,
Thanks for your great summary! There is another related PR: https://github.com/Project-MONAI/MONAI/pull/3518.
Thanks.
for the record, we could consider this cache dataset config vs content Integrity issue https://github.com/Project-MONAI/MONAI/issues/999.
Hi @ericspod @wyli @dbericat ,
Another thing I want to mention is that: we can also consider to add the package format verification with predefined schemes
, for example: check the JSON syntax, check necessary fields in the meta data, check essential folders, etc.
Thanks.
Hi @ericspod @wyli @dbericat ,
Another thing I want to mention is that: we can also consider to add the package format verification with predefined
schemes
, for example: check the JSON syntax, check necessary fields in the meta data, check essential folders, etc.Thanks.
We'll certainly want to check the format of specific fields in the json data so that they correctly adhere to a standard way of describing transforms and inference so that code can be generated from them. With the other data that can be added it would be preferrable that this can be as free-form as people like, so I would expect standardized elements in the metadat adhering to our format mixed in with ones that are application-specific in any format.
@ericspod sounds good to me! We can do that when we finalized the first MVP MMAR example.
Thanks.
As for the machine consumer of the model package, specifically TrochScript, we do have a specific one, Triton Inference Server. Triton can automatically generate runtime config for all the backends it supports, except PyTorch. Triton team had logged an issue for PyTorch, though based on the discussions we've already had here, the requested metadata are covered in the MONAI model package. Here is the PyTorch issue 38273,
For deployment of AI apps/models in clinical environment, there are a number of persona and user stories.
For integration with DICOM network and selection of relevant series to feed the model, an application needs some metadata description about the DCIOM study/series,
For models with multi-channel, e.g. Brain tumor segmentation model taking in images with T1, T2, FLARE, etc. When deployed, the application needs to request/receive these DICOM series, converts them into volumetric images, registers them, and then generates a multi-channel single image to feed the model
For flexibility in orchestrating the inference pipeline/Compose, there should not be implicit dependency in the inference config, e.g. MONAI Handlers depending on Ignite.
Hi @MMelQin ,
Thanks very much for your detailed feedback!
These are important information for metadata. It's a long way to finalize the metadata content and format.
Let's try to provide a minimum metadata for the first version, then add or adjust according to users' feedback.
We also definitely should define the metadata format
, standard
, verification schemes
, etc.
Thanks.
Thanks for the great online discussion for the model package format at last Friday, I updated the task list according to the feedback and new ideas.
Thanks.
Current related PR list:
Related Pytorch issue: https://github.com/pytorch/pytorch/issues/38273
Hi @MMelQin , thanks for all you guys feedback, I updated the metadata.json
in MMAR example.
Thanks.
Another related PR relating to the configuration code: https://github.com/Project-MONAI/MONAI/pull/3720
Step 2 in the task list had been merged.
Thanks.
Another related PR relating to the configuration parsing: https://github.com/Project-MONAI/MONAI/pull/3818.
Thanks.
Another related PR relating to the configuration parsing: https://github.com/Project-MONAI/MONAI/pull/3822.
Thanks.
Related PR for common training, evaluation and inference: https://github.com/Project-MONAI/MONAI/pull/3832 .
Thanks.
As we are still developing the ConfigParser
in PR https://github.com/Project-MONAI/MONAI/pull/3822, I compared it with the Hydra OmegaConf
: https://omegaconf.readthedocs.io/en/latest/index.html
There are several interesting features of it we still not support:
dot list
instead of dict keys:
https://omegaconf.readthedocs.io/en/latest/usage.html#from-a-dot-listdefault value
if the target node is not existing:
https://omegaconf.readthedocs.io/en/latest/usage.html#default-valuesI think we can try to support 3 with enhanced idea: use the "@" mark to define a placeholder, then parser gives the value. And we can try to support 4 to make the base config similar to "base class with abstractmethod" that must be overridden.
For other features we may consider later.
Thanks.
I have a PR now on the specification document Project-MONAI/MONAI#3834 which should have been a draft PR but oh well. It's very minimal compared to what @Nic-Ma prototyped here and lacks details about intent, use cases, design expectations, etc.
I think https://github.com/Project-MONAI/MONAI/pull/3822 is now intuitive and flexible after a few rounds of refactoring. We may want to provide a system-wide flag to optionally disable eval()
because it is too powerful and unsafe..
I put all the links of related PRs to the task steps in the above ticket description.
Thanks.
Related PR for inference example bundle: https://github.com/Project-MONAI/tutorials/pull/604.
Thanks.
follow-up from the dev meeting:
Revising this tutorial to use our bundle would be a good exemplar: https://github.com/Project-MONAI/tutorials/blob/e3eea8704f5d002002f79cffec112c5c280476b4/modules/transfer_mmar.ipynb
After sharing with the internal NVIDIA Clara team, I put 2 missing features in the task feature.
1. Config python logging properties in a file.
2. Specify rank ID for component to run only on some ranks, for example, saving checkpoint in rank 0.
Thanks.
Feedback from MONAI deploy team:
Welcome more testing and feedback later.
Thanks.
Most of the tasks are completed, I created several new tickets to track the left items, and let's close this big first ticket now.
Thanks.
Is your feature request related to a problem? Please describe. Thanks for the interesting technical discussion with @ericspod @wyli @atbenmurray @rijobro , as we still have many unclear requirements and unknown use cases, we plan to develop the model package feature step by step, May adjust the design based on feedback during the development.
For the initial step, the core team aligned to develop a small but typical example for
inference
first, it will use JSON config files to define environments, components and workflow, save the config and model into TorchScript model. then other projects can easily reconstruct the exact same python program and parameters to reproduce the inference. When the small MVP is ready, will share and discuss within the team for the next steps.I will try to implement the MVP referring to some existing solutions, like NVIDIA Clara MMAR, ignite online package, etc. Basic task steps:
name
/path
&args
. PR: https://github.com/Project-MONAI/MONAI/pull/3720{"dataset": {"<name>": "Dataset", "<args>": {"data": "$load_datalist()"}}, "dataloader": {"<name>": "DataLoader", "<args>": {"data": "@dataset"}}}
. PR: https://github.com/Project-MONAI/MONAI/pull/3818, https://github.com/Project-MONAI/MONAI/pull/3822metadata
).huggingface
(https://github.com/Project-MONAI/MONAI/discussions/3451)."test": "@###data#1"
, 1#
means current level, 2##
means upper level, etc. PR: https://github.com/Project-MONAI/MONAI/pull/3974ConfigItem
andReferenceResolver
in theConfigParser
. PR: https://github.com/Project-MONAI/MONAI/pull/3980/_requires_
keyword for config component (https://github.com/Project-MONAI/MONAI/issues/3942).