algorand / go-algorand

Algorand's official implementation in Go.
https://developer.algorand.org/
Other
1.35k stars 470 forks source link

Trim down langspec.json #4485

Closed michaeldiamant closed 2 years ago

michaeldiamant commented 2 years ago

Problem

https://github.com/algorand/go-algorand/blob/master/data/transactions/logic/langspec.json does not accurately represent the supported AVM specification. Via https://github.com/algorand/algorand-sdk-testing/issues/223, we removed its known major use case.

The story requests identifying + removing fields that external users are not depending on. It implies understanding how users outside of SDKs use langspec.json.

If it's possible, a total deprecation + removal can occur. Minimally, looking to remove fields that provide inaccurate information (e.g. cost [does not account for linear costs]).

Solution

Dependencies

Urgency

fergalwalsh commented 2 years ago

FYI I have some contract writing/checking tools that rely on langspec.json. It would be rather inconvenient if it disappeared as I'd have to regenerate basically the same thing from go-algorand. I'm using it to determine all opcoces, their args, arg types, fields, sizes and even docstrings. It would be really useful if it contained cost information but it doesn't currently.

If you have another machine readable source of this information I'd be happy to switch to that of course.

does not accurately represent the supported AVM specification

What is currently missing apart from costs?

jannotti commented 2 years ago

One problem is that it does not have any version dependent structure. That is at least one of the reasons it doesn't have costs. Costs have changed from version to version. It also contains a size field, but that field can't be perfect, since some instructions have varied lengths (incblock, pushint, the upcoming switch). Further, we have ideas about encoding some instructions more efficiently so that, for example, a bnz instruction might consume only two bytes if it is a short jump. That would be both a version dependent difference and a dynamic property of the instruction.

All in all, the biggest problem we have is that langspec.json is "write-only" for go-algorand, and unspecified. It's hard to maintain because it's unclear what contract must be maintained with external tools like yours. Could you tell us more about how you use it, and what you depend upon?

One thing that I think is easy to maintain, and potentially of use to dev tools is programatic access simply to opcode names, and documentation. That can be useful for IDE like tools. But note that langspec.json is also deficient there, since pseudo-ops are not in it.

fergalwalsh commented 2 years ago

Thanks for the details. Yes versioning is an issue alright. I handle that manually by referring to a specific version of the file but it is not ideal.

My main use case is a compiler that includes argument and type checking. I only hardcode a very few opcodes that are used specially but for the most part langspec.json is used to discover available opcodes and their constraints. This means I don't have to do any work to support opcodes from new Teal versions other than pointing at a new langspec.json.

I mostly use size to determine the number of immediate args required for some ops. That is a hack though based on the current format of the file. I'd like to have cost available for some static analysis.

I also some editor tools in the works that use docs like you mention for inline documentation. I'll admit though it isn't perfect because the docstrings are not completely standardized. e.g:

app_global_put(A: byte, B: Any)
write B to key A in the global state of the current application
asset_holding_get(F: field, A: Any, B: int) -> int, Any
X is field F from account A's holding of asset B. Y is 1 if A is opted into B, else 0

the biggest problem we have is that langspec.json is "write-only" for go-algorand,

Is there any sensible way that the assembler could use langspec.json? In this way you'd be forced to make sure it really is a complete spec of the 'language'. This would also enable alternative compilers/assemblers to be implemented more easily outside of Go.

In summary, I'm not particularly attached to the current langspec.json but I do really like having this kind of thing

jannotti commented 2 years ago

When you say "compiler that includes argument and type checking" are you compiling TEAL? Or some higher level language? If TEAL, you might want to try the assembler on its own again, I think that ever since this PR: https://github.com/algorand/go-algorand/pull/3870 the assembler might be doing a good enough job to satisfy you. (Unless you are doing better control flow analysis.)

Similarly, I would hope the assembler is giving good error messages about the number of immediate arguments these days. For example, I have some emacs code hacked up that shows me the lines where I have errors of that kind, based on the output of goal clerk compile.

Screen Shot 2022-08-31 at 12 16 16 PM

The real "source of truth" for how the assembler operates is mostly in the opcodes table in opcodes.go. I suppose we could convert that to JSON and spit it out, but I'd hate to commit to the current data types there, so it would probably not be a good solution.

fergalwalsh commented 2 years ago

When you say "compiler that includes argument and type checking" are you compiling TEAL? Or some higher level language?

We have a compiler/transpiler for a high level language to Teal. It does type checking during the transpilation. Some slightly contrived examples:

int x = sqrt(Txn.ApplicationArgs[0])

This gives a useful error: Incorrect type for arg 0 of sqrt. Expected int, got byte. Line 1

Similarly:

byte asset_1_reserves = btoi(...)
byte asset_2_reserves = btoi(...)
int k = asset_1_reserves b* asset_2_reserves

Error: Incorrect type for int assignment. Expected int, got byte. Line 3

Some of these would be caught by the assembler but catching during the transpilation phase is much more useful in terms of the errors we can give. Also some errors would not be caught by the assembler because the values are store'd and load'd but because we track types of slots we can identify those errors early.

This is all determined using langspec.json so having something like this is immensely useful.

michaeldiamant commented 2 years ago

@fergalwalsh Thanks for the feedback - closing because we plan to keep langspec.json as is for now.

jannotti commented 2 years ago

Got it. That makes sense, I think it makes sense for us to at least maintain information to allow that (and perhaps a bit more - we may want to add info on pseudo-ops, for example).

Costs are hard to describe (change by version), and getting harder, as we introduce costs that vary on the size of inputs, such as base64_decode.

We do try to do some scratch slot type tracking now too. But I suspect you are correct that the assembler would not catch as much as your HLL, if you have a convenient way to declare what goes into a scratch slot. (We look to see what you last stored, and complain only if not conditional code could have confused us in the meantime.). https://github.com/algorand/go-algorand/pull/4064

On Thu, Sep 1, 2022 at 11:35 AM Fergal Walsh @.***> wrote:

When you say "compiler that includes argument and type checking" are you compiling TEAL? Or some higher level language?

We have a compiler/transpiler for a high level language to Teal. It does type checking during the transpilation. Some slightly contrived examples:

int x = sqrt(Txn.ApplicationArgs[0])

This gives a useful error: Incorrect type for arg 0 of sqrt. Expected int, got byte. Line 1

Similarly:

byte asset_1_reserves = btoi(...) byte asset_2_reserves = btoi(...) int k = asset_1_reserves b* asset_2_reserves

Error: Incorrect type for int assignment. Expected int, got byte. Line 3

Some of these would be caught by the assembler but catching during the transpilation phase is much more useful in terms of the errors we can give. Also some errors would not be caught by the assembler because the values are store'd and load'd but because we track types of slots we can identify those errors early.

This is all determined using langspec.json so having something like this is immensely useful.

— Reply to this email directly, view it on GitHub https://github.com/algorand/go-algorand/issues/4485#issuecomment-1234449709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADL7TYW65ARCE2FPC54JWLV4DEKTANCNFSM577JHMSQ . You are receiving this because you commented.Message ID: @.***>