Open lrettig opened 1 week ago
Generally, the program might choose whatever serialization method it wants. We use scale but it's not enforced anywhere, so it might not be possible to have a generic way.
Even just checking the length of the input data, as a simple sanity check, would be an improvement.
We could check if all bytes were consumed from the input.
We could check if all bytes were consumed from the input.
This is a great starting point and seems like an easy win - although wouldn't this happen after the sort of error I'm describing above? In other words, wouldn't we have to "peek" at the length of the input before execution?
the program might choose whatever serialization method it wants
This is true and indeed this is the point of our template system and of account unification, but we can still offer a reasonable set of defaults, so I don't think this is a reason not to try to do better. You're right that maybe we can't do it in the core, but we can do it in the template itself.
The AI recommends using a serialization scheme like protobuf that allows encoding of type information. An alternative is to wrap structs in a typed message and/or use enums, e.g.:
#[derive(Serialize, Deserialize)]
#[serde(tag = "type")]
enum Message {
SpendArguments {
recipient: Address,
amount: u64,
},
// Add other message types here
}
let message = Message::SpendArguments {
recipient: /* recipient address */,
amount: 1000,
};
let serialized = serde_json::to_string(&message)?;
I think this isn't a bad idea.
The AI recommends using a serialization scheme like protobuf that allows encoding of type information. An alternative is to wrap structs in a typed message and/or use enums, e.g.:
I don't think using something like JSON and protobuf is a good idea if we care about:
We could check if all bytes were consumed from the input.
This is a great starting point and seems like an easy win - although wouldn't this happen after the sort of error I'm describing above? In other words, wouldn't we have to "peek" at the length of the input before execution?
Gave this some more thought and realized it's not possible. We can know the length of the input stream before executing the program, but we cannot know how many bytes the program will read without actually running it.
The only alternative I can think of is to require the program to self-declare up front which data it will read and in which order. That would sort of address the problem, but it puts the burden on the developer, which I don't like.
I don't think using something like JSON and protobuf is a good idea if we care about
The ideal here is a serialization codec that has cross-platform support, uses a compact encoding, and is at least partially self-describing. Or we can possibly manually add the description along the lines I described above. Will give this some more thought.
the program might choose whatever serialization method it wants
Note that, for now at least, we've effectively enshrined SCALE here:
Yes, a program could still choose to use a different encoding, but that would mean using a different SDK to write and compile the program, which is pretty unrealistic.
It's trivially easy to accidentally corrupt program input. All it requires is passing the wrong data types, or extra data, or data that's out of order, into the program stdin. Here's a very simple example: https://github.com/athenavm/athena/compare/main...input-corruption-example
By simply adding this:
https://github.com/athenavm/athena/blob/4201b627bf53a57430e593bab15820c33e97f991/examples/wallet/script/src/bin/execute.rs#L77-L78
We get something like this:
This is a really big footgun: this sort of mistake is easy to make and hard to detect. Can we put some guardrails in place here?
Is there any way to detect the corruption when deserializing the input data and report the issue there?
Can we go even further and store type information along with the raw bytes in stdin so we can do some type matching when reading?
Even just checking the length of the input data, as a simple sanity check, would be an improvement.