Open borijang opened 3 years ago
@borijang where is convert_onnx.py ? I can't find it!
It looks like this issue is known https://github.com/NVIDIA/TensorRT/issues/805
unclear if it will be implemented at some point or not.
It looks like if you can rewrite the bit of the model that uses subscript assignment you can get around this. https://paulbridger.com/posts/tensorrt-object-detection-quantized/
Sorry, thought that script was from this repo, but I must have reused it from somewhere else.
I am aware of the TRT issue, but I am not sure where the problem arises in the yolor code.
would you share the script here?
Sure, here you go:
class Params:
def __init__(self, project_file):
self.params = yaml.safe_load(open(project_file).read())
def __getattr__(self, item):
return self.params.get(item, None)
def parse_arguments():
parser = argparse.ArgumentParser()
parser.add_argument('--input', type=str, default='models/yolor_p6.pt',
help="Path to input PyTorch model (.pth checkpoint)")
parser.add_argument('--output', type=str, default='models/yolor_p6.onnx',
help="Desired path of converted ONNX model.")
parser.add_argument('--config', type=str, default='config/coco.yaml', help="Path of the config file")
parser.add_argument('--model', type=str, default='config/yolor_p6.cfg', help="Path of the model configuration")
parser.add_argument('--width', type=int, default=1280, help="input width of the model to be exported (in pixels)")
parser.add_argument('--height', type=int, default=1280, help="input height of the model to be exported (in pixels)")
parser.add_argument('--batch-size', type=int, default=1, help="Batch size of the model to be exported (default=1)")
return parser.parse_args()
if __name__ == '__main__':
args = parse_arguments()
params = Params(args.config)
print(params.params)
print(len(params.params))
batch_size = 1
device = select_device("cpu", batch_size=batch_size)
# Load model
model = Darknet(args.model).to(device)
try:
ckpt = torch.load(args.input, map_location=device) # load checkpoint
ckpt['model'] = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
model.load_state_dict(ckpt['model'], strict=False)
except:
load_darknet_weights(model, args.input)
dummy_input = torch.randn((args.batch_size, 3, args.width, args.height), dtype=torch.float32).to(device)
print("Exporting the model using onnx:")
torch.onnx.export(model, dummy_input,
args.output,
verbose=False,
input_names=['data'],
opset_version=11)
@WongKinYiu do you know if there are other obstacles to exporting to TensorRT? Say we managed to deal with the scatterND issue by rewriting the lines with subscript assignment - are there other unsupported TensorRT ops?
I do not know how to use tensorrt, but one of our team member help to convert and deploy the model for our system.
hmm ok - would it be possible for us to get their code? Ability to export to TensorRT would be my #1 feature request!
@WongKinYiu , I am not sure whether it is only for me or not, but converting to onnx using models/export.py
is not working. first I am getting an import error. but I think the code for loading the checkpoint needs to fixed.
@borijang , Below script is not working for me to convert checkpoint to onnx format. Have you done any changes to Darknet
class? I am getting this error. Can you please help?
RuntimeError: Exporting the operator silu to ONNX opset version 11 is not supported. Please open a bug to request ONNX export support for the missing operator.
class Params: def __init__(self, project_file): self.params = yaml.safe_load(open(project_file).read()) def __getattr__(self, item): return self.params.get(item, None) def parse_arguments(): parser = argparse.ArgumentParser() parser.add_argument('--input', type=str, default='models/yolor_p6.pt', help="Path to input PyTorch model (.pth checkpoint)") parser.add_argument('--output', type=str, default='models/yolor_p6.onnx', help="Desired path of converted ONNX model.") parser.add_argument('--config', type=str, default='config/coco.yaml', help="Path of the config file") parser.add_argument('--model', type=str, default='config/yolor_p6.cfg', help="Path of the model configuration") parser.add_argument('--width', type=int, default=1280, help="input width of the model to be exported (in pixels)") parser.add_argument('--height', type=int, default=1280, help="input height of the model to be exported (in pixels)") parser.add_argument('--batch-size', type=int, default=1, help="Batch size of the model to be exported (default=1)") return parser.parse_args() if __name__ == '__main__': args = parse_arguments() params = Params(args.config) print(params.params) print(len(params.params)) batch_size = 1 device = select_device("cpu", batch_size=batch_size) # Load model model = Darknet(args.model).to(device) try: ckpt = torch.load(args.input, map_location=device) # load checkpoint ckpt['model'] = {k: v for k, v in ckpt['model'].items() if model.state_dict()[k].numel() == v.numel()} model.load_state_dict(ckpt['model'], strict=False) except: load_darknet_weights(model, args.input) dummy_input = torch.randn((args.batch_size, 3, args.width, args.height), dtype=torch.float32).to(device) print("Exporting the model using onnx:") torch.onnx.export(model, dummy_input, args.output, verbose=False, input_names=['data'], opset_version=11)
@borijang , Below script is not working for me to convert checkpoint to onnx format. Have you done any changes to
Darknet
class? I am getting this error. Can you please help?RuntimeError: Exporting the operator silu to ONNX opset version 11 is not supported. Please open a bug to request ONNX export support for the missing operator.
I'm getting the same error using the script.
Using models/export.py
(after commenting attempt_download), I get:
Traceback (most recent call last):
File "models/export.py", line 21, in <module>
model = torch.load(opt.weights, map_location=torch.device('cpu'))['model'].float()
AttributeError: 'collections.OrderedDict' object has no attribute 'float'
@satheeshkatipomu @JonathanSamelson
I haven't modified Darknet. It works for me using the docker image nvcr.io/nvidia/pytorch:21.03-py3
. It may be a pytorch issue, try upgrading it to 1.9.0.
@borijang Perfect, it's working now using this docker image. Thanks a lot!
@JonathanSamelson what are you doing with your ONNX? have you managed to get tensorrt inference working?
@LukeAI Sorry, I'm using ONNX for Python inference, I do not know for tensorrt 😕
@borijang thanks you! yolor_p6 exported to onnx!
but now i cannot export from onnx to tensorrt:
Traceback (most recent call last):
File "onnx_to_trt.py", line 10, in <module>
engine = backend.prepare(model, device='CUDA:0')
File "/opt/conda/lib/python3.8/site-packages/onnx_tensorrt-7.2.2.3.0-py3.8.egg/onnx_tensorrt/backend.py", line 236, in prepare
File "/opt/conda/lib/python3.8/site-packages/onnx_tensorrt-7.2.2.3.0-py3.8.egg/onnx_tensorrt/backend.py", line 68, in __init__
RuntimeError: While parsing node number 642:
/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/builtin_op_importers.cpp:4135 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
also i tried convert with torch2trt and got warnings: Warning: Encountered known unsupported method torch.Tensor.expand_as
and result of
x = torch.ones((1, 3, 640, 640)).cuda()
y = model(x)
y_trt = model_trt(x)
torch.max(torch.abs(y - y_trt))
was very large: tensor(784.60699, device='cuda:0', grad_fn=<MaxBackward1>)
like at this issue, i replaced expand_as(x) with expand(x.size()) and got error AttributeError: 'Parameter' object has no attribute '_trt'
@borijang @satheeshkatipomu @JonathanSamelson have you faced same problems? and how you solve it?
thanks in advance!
I've successfull convert Yolor_x (yolor_csp_x_star.pt) from torch to ONNX, and also TensorRT with some modifications in models/models.py, which the most one is to avoid broadcasting (avoid using ScatterND plugin). You can see my modifications here (sr for my uncleaned code)
@NNDam thanks for sharing, I'll give it a test. Do you think it will likely work for the other models as well?
@NNDam I trained a quick model on a small private dataset and was unable to get it to run with the existing tensorrt c++ that I use with scaled-yolo - would you mind sharing your inference code for reference?
I notice that there are five getNbBindings() - the first is the right size for input - what are the other 4?
@LukeAI there was unused 3 output layers (from 3 yolo detect layers), remove them by onnx_graphsurgeon, for example:
import onnx_graphsurgeon as gs
import onnx
from onnx import shape_inference
input_model_path = 'yolor_x.onnx'
output_model_path = 'yolor_x_cleaned.onnx'
onnx_module = shape_inference.infer_shapes(onnx.load(input_model_path))
while len(onnx_module.graph.output) != 1:
for output in onnx_module.graph.output:
if output.name != 'output':
print('--> remove', output.name)
onnx_module.graph.output.remove(output)
graph = gs.import_onnx(onnx_module)
graph.cleanup()
graph.toposort()
graph.fold_constants().cleanup()
onnx.save_model(gs.export_onnx(graph), output_model_path)
ok, thanks! have done as you advise but still struggling to get inference working... are you planning to release your inference code?
Hello all, I have managed to export into ONNX format a custom yolor model (9 classes) using @NNDam's code, my issue is that the output dimensions still hold the 85 (80+5) number of COCO classes, does anyone know what should I do in order to have the correct export? Thanks all for the very useful information in this thread!
Hi @dimleve This issue I opened might be related to your problem even though I haven't figured it out yet.
@dimleve have you tried setting the number of classes in the yolo layers in the config? (and you will have to set the correct number of filters in the preceding convolutional, also)
ie. here and in the other two layers
filters should be (classes + 5)x3 = 42 for 9 classes
to be honest I'm not 100% sure if that's correct about the filters - that's how it was in older yolos but I'm not sure sure if these[control_channels] thing disrupts that
Thanks bot @LukeAI and @JonathanSamelson, I will check and come back with my findings. I get the following error: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([42]). Not sure, but it seems that custom YOLOR model still has the 85 class setting (80+5 *3) although I am explicitly setting the nc = 9 in the data.yaml configuration, need to check further.
maybe the filters also have to be set to 42 at these point also? not sure https://github.com/WongKinYiu/yolor/blob/2fa3a318f364a4eb58721c90e5a978a78f0da58a/cfg/yolor_csp_x.cfg#L1433
but maybe if you already trained with the cfg file set with filters=255 etc. then that's what your checkpoint has, so you will just need to run with that many outputs? I guess your 9 classes will be represented in the first 9?
Looking in train.py - the model is created using the cfg file, not the yaml.
maybe the filters also have to be set to 42 at these point also? not sure
but maybe if you already trained with the cfg file set with filters=255 etc. then that's what your checkpoint has, so you will just need to run with that many outputs? I guess your 9 classes will be represented in the first 9?
Looking in train.py - the model is created using the cfg file, not the yaml.
@LukeAI It seems that you are right, no need to modify anything and my classes are represented in the first 9, I will check further and verify, thank you!
Have you guys converted successfully and run yolor on Jetson Xavier? Thanks
Hi all, I've successfully converted my custom yolor model to tensorrt and run on Jetson Xavier AGX! To fix the ScatterND issue, just upgrade Jetpack to the latest version 4.6. It contains tensorrt 8.0 (which supports ScatterND plugin).
Thanks for this repository! I managed to convert a trained model to ONNX using
convert_onnx.py
, but I can't manage to convert it to TensorRT for inference on a Jetson Xavier NX.I have included the TensorRT (v7.1.3) build output below:
Any ideas on how to solve the ScatterND issue? Seems like a broadcasting operation unsupported by TRT. Maybe using different opset version than 11, or rewriting all the lines that have elipsis indexing?