Make leela2onnx Conv-nodes compatible with onnx2pytorch

patrik-ha commented 11 months ago

Adds kernel_shape as a property to all generated Conv-nodes for generated ONNX-networks, to partially address #1736 . This was specifically done for the onnx2pytorch-package requiring. The ONNX-standard states that this field is optional, and if not provided it should be inferred just looking at the dimensionality of the weights. These changes make some networks generated by leela2onnx readable by onnx2pytorch.

I verified that it works with the network that comes bundled with the latest lc0-release (791556) at least. However, many of the networks fail since onnx2pytorch hasn't implemented layer-normalisation, (as was the case with many of the networks I tried (mostly from https://lczero.org/play/networks/bestnets/). I also considered making a PR over there implementing that, but it seems pretty dead.

I'm not sure if it is correct that this project should add stuff like this to accomodate such requirements in other projects. It is practical that it works with other software that people are likely to use, especially in this case since it's quite a small change (and follows an optional part of the standard). I'm a big advocate for making the barrier as low as possible for other people to tinker with lc0-networks (and for that it's nice that conversion between pytorch and ONNX works as well as it could), but probably not at the cost of more bloat. I haven't done any large contributions, so that would be up to someone else to judge, but I wouldn't lose any sleep if this wasn't merged.

borg323 commented 11 months ago

You can find an optional LayerNormalization replacement in https://github.com/borg323/lc0/commit/bc18becbc32241502ec62a94b43301c26713f76f

patrik-ha commented 11 months ago

Thanks for that. I confirmed that all the nets I tested can now be loaded correctly by onnx2pytorch using your alternative LayerNorm-definition.

Is it worth considering making this the default implementation of LayerNorm for the ONNX converter? It introduces more complexity and makes the ONNX graph more complex to look at I guess, but since it's just the same operation defined more explicitly it's nice that it just works without any extra hassle. But that might again come down to how much one wants to accomodate projects like onnx2pytorch. (a somewhat dead project, but still the first thing that most people run into when they want to convert leela-onnx nets to pytorch)

I pushed some changes making that implementation the default for LayerNorm to this branch, but that can be ignored if it shouldn't be used.

borg323 commented 11 months ago

The last commit went too far, removing the LayerNormalization() operator unconditionally. I think the best way is to add a --onnx2pytorch switch to leele2onnx to control this. And while you are there, are there any other converter options that could be useful to make available to the user?

patrik-ha commented 11 months ago

Yeah I understand, I've reverted that and put it behind a switch instead. I've made the switch for the user explicitly for onnx2pytorch, but internally I think it's nicer that that option toggles whatever needs to be different for the converter to make networks that are compatible. (I.e. if the user sets onnx2pytorch, it internally enables alt_ln or whatever relevant behaviour that's needed. The main thing is that there shouldn't be an internal option just for onnx2pytorch/compatibility I mean)

Regarding other relevant things, I'm not completely sure, but I don't think so. I was thinking if it was possible to put behaviour solving #1735 behind a similar switch, but in that case it doesn't look viable anyway. It might be nice to add explicit switches to enable the alternate implementations of mish and layer-norm since they already have options ready to go internally, but I don't know if it's it's that relevant for most people. They could maybe be added as hidden to avoid cluttering --help.

borg323 commented 11 months ago

Do you think options to change the onnx opset or the ability to generate a fp16 modes (with either fp16 or fp32 input/outputs) would be useful? Regarding the alternative implementations we already have, LayerNormalization is a special case since onnxruntime accepts it for every opset even though it was only added in opset 17, so a separate switch makes sense. This is all unrelated to this PR, just asking here since you are using the onnx models generated by lc0.

patrik-ha commented 11 months ago

I think it's already quite easy to convert fp32 leela-models to fp16/mixed precision in general. The few times I have needed to, the converter I used worked for every net I tried at least. (I used https://onnxruntime.ai/docs/performance/model-optimizations/float16.html) It's also supposedly quite good with how it converts certain nodes if they aren't supported in fp16 for example, meaning that the leela-onnx-converter doesn't have to maintain workarounds for it in the meantime. And it's still something that might have to be maintained in the future if it was an explicit feature.

I haven't run into any cases where I've been explicitly limited by the opset of my model (i.e. "I couldn't run my model because my device/software/whatever doesn't support this opset"), but I've just been using them on up-to-date systems where I can install whatever I need, so I'm not a good reference point on that. It also seems like a burden to have to support it. But there may be some use-cases I'm not seeing, and if it's already something that has been committed to being maintained I guess it could be useful to expose that as an option.

So I guess for both of them I would say that they would be nice to have, but that I think that they might not be worth maintaining as explicit features. (solely based on my own usage though)

LeelaChessZero / lc0

Make leela2onnx Conv-nodes compatible with onnx2pytorch #1924