Open borisfom opened 3 months ago
Hi @borisfom ,
Thanks for this wonderful proposal! It's very impressive.
For the proposal, I have three questions. Let's discuss them here.
trt
parameter to every network_def
as shown in your netshell example. The network_def
is used for instantiating a network (CNN, RNN, Transformer...). Because the network_def
is usually an implement of a typical algorithm, the contributor of this algorithm may not have any knowledge about TensorRT. If we enforce every contributor to provide the trt
parameter, it would be unreasonable. I think the better way is to add a new list-like class to patch submodules to the bundle as a fix parameter like trt_submodule
. Then at the initilization part of the bundle we can replace the submodules if the fix parameter is provided. Thanks Bin
Hi @binliunls ! Thanks for the input! For 1): the way the wrapper currently works, it will reuse existing engine if it's already there first time we run inference (and has timestamp later than the config file) 2) Yes absolutely we should have Torch-TRT option (and later, dynamo). I will add it to existing TRTWrapper PR. 3) Agree - having trt section as an optional patch makes better sense.
@binliunls : updated TRTWrapper PR with torch-tensorrt option (not tested yet, may need refinements).
Here, I implemented first POC for brats_mri_generative_diffusion from model-zoo : https://github.com/Project-MONAI/model-zoo/pull/620
@binliunls : I did some refactoring of TRTWrapper/trt_wrap API after working on a few use cases : now TRTWrapper is not supposed to replace the original module, but instead it's being saved in the original module and modules's forward() is being monkey-patched with a method that calls TRTWrapper. This way, wrapping can happen before checkpoint load (as model structure and therefore, state dict is unchanged). trt_wrap still returns the original model to facilitate use in configs. I have also added trt_handler class for alternative use of trt_wrap as a handler : check pathology_nuclei_classification use case (I did implement handler there but the default run seem to exit prematurely before it gets to the execution - am I missing some step in the initialization?)
Hi @borisfom I have some suggestions for this new implement:
inference_trt.json
file in the bundle. I saw you modified a few parameters in the inference.json
file here. Actually, these modifications can be added to the inference_trt.json
. Then when you run the inference, you can choose to run command python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
.
It will merge the two configurations to a new one and run the infrence. In this way, we can maintain the original inference configuration.Thanks
Bin
@binliunls :
There is a number of issues with current TRT acceleration path in MONAI:
Describe the solution you'd like I would like to have a self-contained recipe in inference.json for TRT conversion that would work for multiple inputs with different data types and also would work for cases where only some submodules can be converted to TRT. Ideally, there should be no explicit TRT export pass, and conversion would happen 'on demand' during inference and cached. Persistent TRT engines would be rebuilt if I change config or update TRT version. I would like to have a single inference.json for TRT and non-TRT inference, and have TRT on/off switch on the command line.
Describe alternatives you've considered I explored few of those zoo models that still need TRT and I think we can come up with an elegant TRT acceleration solution, using extended network definitions in typical inference.json configs utilizing TRTWrapper (from https://github.com/Project-MONAI/MONAI/pull/7990) network_def would have an optional ‘trt’ section(s) describing which submodules (or the whole net) should be wrapped in TRTWrapper. convert_to_trt() can probably be fixed for multi-input/multi-datatype nets, especially if we add network_data_format to metadata.json for - but that would still require additional conversion pass(which has to be explicitly rerun each time TRT is upgraded or config changed) etc. Should be way more user-friendly with TRTWrapper.
In a nutshell, sample network_def would look like:
So for each net, one can specify set of submodules that should be wrapped in TRTWrapper (or whole net if no submodule). “path” basename can probably also generated automatically from checkpoint name. but explicit would do as well. No other configuration or explicit TRT conversion pass would be needed for export! And generated .plan files would be automatically rebuilt on config file change. We would not need two separate inference.json and inference_trt.json configs - just a top-level ‘trt’ config flag, overridable by command line.
At workflow initialization time, the above config would result in replacement of “image_encoder” attribute of network with :