Closed rbavery closed 1 year ago
Our goal here is to make sure we can depend on using ONNX for evalem and evalem-ft as a standard model format to load from for inference and finetuning
Currently, within HF's transformers, we have a set of ONNX-compatible models: https://huggingface.co/docs/transformers/serialization
The concern is: can we able to do it for any upstream models down the timeline? (just reiterating the same question)
Re:
Additionally, the existing evalem.models.HFPipelineWrapper
can still be used with onnx runtime (provided we can load that).
For example, here we're loading roberta-based LM for QA.
from optimum.onnxruntime import ORTModelForQuestionAnswering
from transformers import AutoTokenizer
from evalem.models import QuestionAnsweringHFPipelineWrapper
from evalem.evaluators import QAEvaluator
from evalem.pipelines import SimpleEvaluationPipeline
wrapped_model = QuestionAnsweringHFPipelineWrapper(
model=ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2"),
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2"),
)
evalem_pipe = SimplEvaluationPipeline(model=wrapped_model, evaluators=QAEvaluator())
qa_data = dict(inputs=[dict(context="I am paradox.", question="who are you?")], references=["paradox"])
result = evalem_pipe(qa_data)
Reference: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models
However, I feel we do need an OnnxModelWrapper for more controlled inference/forward-pass for evaluation. Would help in decoupling from framework nevertheless...
Followup:
Apparently, there's no straightforward way to convert any new onnx model to pytorch? Other than HF's transformers contribution, conversion to pt
seems like an active field.
https://github.com/ENOT-AutoDL/onnx2torch
We could use onnx runtime directly. However, if we could do it from scratch for fine-tuning process using onnx runtime. Not sure, if that's a good way to scale.
@NISH1001 I did a quick look and I think you are right. judging by the open issues in that repo and this issue: https://github.com/pytorch/pytorch/issues/21683 it seems like onnx2torch might work for some models but not all if we wanted to use it as an interchange format for finetuning.
I think this might be a reason to instead standardize on torchscript and expect that we will only finetune Torch based models, given that Torch is what almost everyone uses for research nowadays. Torchscript models can be loaded as nn modules no problem for finetuning or inference.
They won't be as fast necessarily as ONNX for inference but maybe with torch.compile that side issue is addressed.
thoughts on standardizing on torchscript models? @weiji14 @muthukumaranR @NISH1001
Considering that ONNX is more of a 'standard', I'd still prefer if there is a layer of compatibility with ONNX. Also don't think it's an either Torchscript or ONNX decision, we could always have both as an intermediary format, and have ONNX->PyTorch and Torchscript->PyTorch converters in place.
If going for pure speed in an inference setting, TensorRT would be best, followed by ONNX Runtime, and then Pytorch, according to https://els-rd.github.io/transformer-deploy/compare/#inference-engine. Of course, Pytorch has a stronger community than ONNX Runtime, and arguably more metrics libraries are in Pytorch, so the evaluation metrics would almost always need to be computed on Pytorch tensors.
@NISH1001 @rbavery @weiji14 Thoughts on Ivy ?
This seems to directly address compatibility issues and interoperability usecases we were discussing. Although, The compiler/ cross-compiler is not publicly available yet.
@weiji14 and I chatted and agree Torchscript is the way to go. Until something like IVY is battle tested and guarantees new models can be transpiled, or Pytorch ofifcialy supports ONNX loading and not just export, Torchscript is the most sure way to guarantee a model is both portable and loadable for fine-tuning or inference.
I definitely agree with other runtimes being faster. But it's the compromise we'll have to pay on future-proofing on whether new models could be completely serializable to intermediate format like ONNX. Inferences would be nice to have these runtimes though. But still the question of whether "can all models be serializable?" is rampant.
@muthukumaranR Yup. Ivy is neat. The last time I checked it, it was limited ( a year back). I am sure it has got a lot of improvements now.
On similar note:
I experimented today on converting the NASA v6 (QA) model to both onnx and torchscript. I wasn't able to convert back onnx to pure pytorch. (Could as well just do inferencing with onnx runtime, but it has its limitation on what type of models are serializable). Torchscript seems to be better in that respect (although it also seems to have some limitations, mainly on how to do serialization for branching nodes in the networks, etc.). Nevertheless, I was able to load back from torchscript to pytorch and do the inference through the standard HF pipeline. One caveat is: if we're also not depending on HF transformers, the inferencing could need more boilerplate to have the outputs (for instance, how to make sense of final logits from the model? transformers
abstract that a lot nicely). But if we're to go forward with that, then IMHO, having transformers as dependency won't be a hassle either.
Found this page on Torchscript operators that are supported/unsupported by ONNX export: https://pytorch.org/docs/2.0/onnx_supported_aten_ops.html :slightly_smiling_face:
On exporting the burn scar geospatial Foundation model, I'm running into issues with converting the pretrained epoch-832-loss-0.0473.pt
file into either torchscript or ONNX format. This piece of code works fine:
import torch
model = torch.load("epoch-832-loss-0.0473.pt")
But the 'model' is an OrderedDict, and when I try to export it to ONNX:
torch.onnx.export(
model=model,
args=(4, 224, 224),
f="epoch-832-loss-0.0473.onnx",
export_params=True,
opset_version=12
)
It gives an error like AttributeError: 'collections.OrderedDict' object has no attribute 'training'
.
I've tried a few different ways to do the conversion including:
torch.jit.load
instead, but got RuntimeError: PytorchStreamReader failed locating file constants.pkl: file not found
tools/pytorch2onnx.py
script in mmseg at https://mmsegmentation.readthedocs.io/en/latest/user_guides/deployment.html#convert-to-onnx-experimental, but it's giving me ValueError: not enough values to unpack (expected 5, got 4)
due to some hardcoding in the mmseg script expecting Conv2D layers, but the Foundation Model uses Conv3D.epoch-832-loss-0.0473.pt
checkpoint for one epoch, and loading from the best_mIoU_iter_10.pth
file that contains the state_dict (?). However, this still gives an error like AttributeError: 'dict' object has no attribute 'training'
.torch.jit.script(obj=model)
, it errors with RuntimeError: Could not get name of python class object
.The impression I'm getting after examining at the mmseg config files and model classes is that the geospatial FM Computer Vision models are architected in a way that has several input blocks and output heads, rather than as a simple linear model architecture. We might need to take a step back first and understand how the Pytorch checkpoints (non-torchscript) were created in the first place, and evaluate on whether torchscript and/or ONNX supports such serializing/deserializing such complex multi-input/head achitectures.
@weiji14 I agree. The torchscript module seems to get into trouble while branching in the networks. Definitely needs more architecture-level understanding of any upstream models. I think onnx can handle branching, but not sure about torchscript.
Used the tools/pytorch2onnx.py script in mmseg at https://mmsegmentation.readthedocs.io/en/latest/user_guides/deployment.html#convert-to-onnx-experimental, but it's giving me ValueError: not enough values to unpack (expected 5, got 4) due to some hardcoding in the mmseg script expecting Conv2D layers, but the Foundation Model uses Conv3D.
Finetuning from the epoch-832-loss-0.0473.pt checkpoint for one epoch, and loading from the best_mIoU_iter_10.pth file that contains the state_dict (?). However, this still gives an error like AttributeError: 'dict' object has no attribute 'training'.
These were the two approaches I was thinking of pursuing, each framework typically has a script to export to torchscript or onnx. If the canned script doesn't work then I think we can try doing the export ourselves by handling the checkpoint file and model definition. Currently I'm working on this here: https://github.com/NASA-IMPACT/hls-foundation/pull/12
Used the tools/pytorch2onnx.py script in mmseg at https://mmsegmentation.readthedocs.io/en/latest/user_guides/deployment.html#convert-to-onnx-experimental, but it's giving me ValueError: not enough values to unpack (expected 5, got 4) due to some hardcoding in the mmseg script expecting Conv2D layers, but the Foundation Model uses Conv3D. Finetuning from the epoch-832-loss-0.0473.pt checkpoint for one epoch, and loading from the best_mIoU_iter_10.pth file that contains the state_dict (?). However, this still gives an error like AttributeError: 'dict' object has no attribute 'training'.
These were the two approaches I was thinking of pursuing, each framework typically has a script to export to torchscript or onnx. If the canned script doesn't work then I think we can try doing the export ourselves by handling the checkpoint file and model definition. Currently I'm working on this here: NASA-IMPACT/hls-foundation#12
Got a little futher today! Figured out how to load the model correctly from the mmseg config file. Apparently the pretrained checkpoint file we were given was just the state_dict and not the actual model architecture+state_dict. See also https://stackoverflow.com/questions/48419626/pytorch-cant-load-cnn-model-and-do-prediction-typeerror-collections-orderedd.
Still, the export using standard Pytorch functions isn't working. Have got all the code and error messages documented at https://github.com/NASA-IMPACT/hls-foundation/pull/20#pullrequestreview-1468983236 if you want to check it out @rbavery. Thanks also for getting the docker image and tornado config working, it was super helpful to have a standardized working environment!
When running inference with torchscript traced models, they need a fixed input shape. We could work around this by chipping up any arbitrarily large image size for evaluation but it would add a lot of overhead. So ideally we find a way to export and load the model in a way that supports dynamic shapes
Exporting to ONNX works, but loading the model to torch seems unsupported without additional effort on our side. I get this error with the most recent and popular onnx > pytorch lib and it doesn't look supported anytime soon: https://github.com/ENOT-AutoDL/onnx2torch/issues/147
So I think our best option is to have evalem and hls evaluation depend on onnxruntime. Instead of loading onnx models to torch model types, we'll use onnxruntime to load and run the model.
mae - used by HLS model we can test this by exporting the HLS model swin - Mask2Former viT - we can find an example in huggingface tranformers