New opcode: modexp - Githubissues

mangoplane commented 2 months ago

Problem

I believe modexp (modular exponentiation) should be available as an opcode, as it seems like an important crypto primitive. It is also present in many other chains like EVM. While I have implemented its functionality in Puya BigNumber, I'm afraid its cost is prohibitively expensive for most real-world applications like RSA signature verification. This is why I propose it be made an opcode.

This opcode would pave the way for several novel use cases, such as supporting RSA, which is still a popular cryptographic function seen in many places like JWT verification and DKIM in email.

Solution

I offer to make a PR adding support for the opcode, updating several source files in data/transactions/logic to integrate it. Before creating the PR, I thought it best to raise an issue to provide context and see whether it's something others would like to see added. The API design would be something typical, likely modexp(base: Bytes, exponent: Bytes, modulus: Bytes). The opcode cost would be a function of the input size, similar to base64_decode.

Dependencies

There are no known dependencies.

Urgency

I am aware of several developers, including myself, who wish to support RSA in some form, such as for DKIM in email or privacy-preserving JWT verification. In that sense, it seems like an important opcode.

emg110 commented 2 months ago

As a builder in ecosystem, I humbly think that will be a very useful OpCode. I humbly suggest if you can make that PR and do it, go for it. Demand for such contributions is very high right now.

jannotti commented 2 months ago

I'd support that, I believe it is fairly trivial, since Go's big.Int supports it.

I'm not completely sure about the cost. Can the computational cost be expressed by multiplying the lengths of the inputs by some factors and summing?

It should probably be called bmodexp to fit the pattern of other opcodes that operate on bignums/bytes.

jannotti commented 2 months ago

Separately, I would also support an rsa_verify though my impression is that RSA has more competing formats than ecdsa or ed25519. Anyway, if someone wants RSA, it might make just as much sense to get the entire operation into a single opcode.

FrankSzendzielarz commented 2 months ago

Separately, I would also support an rsa_verify though my impression is that RSA has more competing formats than ecdsa or ed25519. Anyway, if someone wants RSA, it might make just as much sense to get the entire operation into a single opcode.

It's probably worth doing as more interest across chains in auth seems to be happening, and passkey related solutions still have to contend with Windows Hello as a platform authenticator, and that uses RS256 (cose algo -257). AFAIK no other chain is dealing with passkey delegated smart account control using anything other than ecdsa or ed25519 type stuff, so an RS256 Windows Hello verifier might give you guys some kind of edge

mangoplane commented 2 months ago

Separately, I would also support an rsa_verify though my impression is that RSA has more competing formats than ecdsa or ed25519. Anyway, if someone wants RSA, it might make just as much sense to get the entire operation into a single opcode.

I'm not opposed to the idea, and see value in implementing this. However as you touched on, rsa_verify comes in many flavors that may introduce too much complexity to support universally, with the current standard being defined by RFC8017.

What might be better is to expect developers implement it themselves with the modexp opcode. In Puya-RSA, I have already implemented RSASSA-PKCS1-V1_5-VERIFY from RFC8017 , which supports RS256. The upcoming modexp opcode would be a drop-in replacement for the TEAL version I currently have.

If it were to be added as an opcode, it'd make sense to support the widely used RSASSA-PKCS1-V1_5-VERIFY function. This signature verification algorithm can verify RS256 signatures if SHA256 is used to create the message digest.

mangoplane commented 2 months ago

I'd support that, I believe it is fairly trivial, since Go's big.Int supports it.

I'm not completely sure about the cost. Can the computational cost be expressed by multiplying the lengths of the inputs by some factors and summing?

It should probably be called bmodexp to fit the pattern of other opcodes that operate on bignums/bytes.

Thanks for the suggestions. I'll do some research into the cost, and propose something in the PR which we can review together.

mangoplane commented 2 months ago

Preliminary work into figuring out a suitable function to estimate the cost is complete. Inspired by EIP-2565, it appears that fitting a line to x = exponent_length * max(base_length, modulus_length)**2, with y as the average ns/op (after scaling by the average Algorand gas cost per ns as estimated with b%), yields a good fit. My test data is a search grid over the range of 8-128 bytes, in increments of 8 bytes made with Golang bench similar to other benchmarks.

Note the data looks bimodal, as I think my computer didn't maintain constant speed on this task due to sleep or multitasking. It should be a good starting point to refine.

Jupyter Notebook of Results: https://colab.research.google.com/drive/1f9evrZLmLFPg6YgbkLHptZZGiNm_G6KX?usp=sharing

For comparison, a constant or linear function is a highly inaccurate estimate. Suggesting it makes sense to implement a custom cost function implementing the function estimated from the notebook in data/transactions/logic/opcodes.go. Interested to hear everyone's input.

mangoplane commented 2 months ago

In the meantime, I'll push what what I have as a PR for review and leave the cost as a placeholder with TODO comment, until I hear back about next steps for the cost.

mangoplane commented 2 months ago

Regarding the cost model for bmodexp, @jannotti:

I have conducted a more in-depth analysis, comparing three classes of models with automated model selection. It appears that the log-polynomial model is by far the most accurate, while the linear model is unfortunately highly inaccurate. The cost for the maximum input length is too high to set the cost as a constant. The relative performances of the models are assessed both in-sample and out-of-sample, using the known cost for the extreme input case as the out-of-sample test point. The models have fixed intercept through 0, because for zero length input the cost should be zero (although we will add a minimum cost to that). The coefficient for the model parameters are forced to be positive, based on the intuition that any increase in the input sizes should increase the cost, and to ensure there's no weird negative case to deal with.

Colab: https://colab.research.google.com/drive/1f9evrZLmLFPg6YgbkLHptZZGiNm_G6KX PDF: https://drive.google.com/file/d/1Z6HjmV8scdztfz7ZM6f2VUNRP8nkysYd/view?usp=sharing

The data may not be available in the Colab, as it isn't permanently uploaded. Should you have issues viewing the results, please see the PDF.

mangoplane commented 2 months ago

Based on the analysis, it appears that the log-polynomial model is the most suitable choice. This means that a custom function for calculating the cost should be added to data/transactions/logic/opcodes.go, specifically for bmodexp. However, I need clarification regarding the data-derived specification you mentioned in your PR feedback, as I'm unsure how to proceed with that aspect. Other than that, I should be ready to implement the log-polynomial model, depending on your thoughts regarding the above points.

johnalanwoods commented 2 months ago

@mangoplane - nice work, can we set up a quick call to chat?

John

CTO - Algorand Foundation

mangoplane commented 2 months ago

Thanks @johnalanwoods! I’d be happy to set up a call. I'm available from tomorrow onward, and since I'm based in Sydney, Australia, any time between 8 AM and 9 PM AEST works well for me. Please feel free to send the invite to my email: winton.nathan.roberts@gmail.com. I look forward to our conversation.

algorand / go-algorand

New opcode: modexp #6139

Problem

Solution

Dependencies

Urgency