JakeOShannessy commented 6 years ago

This PR adds logs to all store calls (along with the protection). This works on chain and passes all tests, although I haven't tested making good use of these logs. I've added some notes in the docs folder on this repo, and I've reproduced that here so you can have a look.

Logging of Storage Calls (On-Chain)

In order to track the changes to storage of over time (over transactions), logs are inserted at each SSTORE call. There are 5 LOG opcodes that append a log to a number of topics.

LOG0 - Log to no topics.
LOG1 - Log to 1 topic.
LOG2 - Log to 2 topics.
LOG3 - Log to 3 topics.
LOG4 - Log to 4 topics.

LOG0 expects there to be 2 arguments on the stack. These two arguments are:

top -> [0] start of memory buffer
       [1] length of memory buffer

For each of the other LOGN calls an additional N stack items are expected below these items to define the topics to which this log will be published. For example: a LOG4 call expects the stack to be as below:

top -> [0] start of memory buffer
       [1] length of memory buffer
       [2] topic #1
       [3] topic #2
       [4] topic #3
       [5] topic #4

Note the use of memory buffer arguments. Any call to LOG requires using memory to pass the data. As we are transforming programs of unknown memory layout, and of an unknown memory allocation scheme, we are unable to allocate out own memory in the general case. We cannot even assume that the Solidity memory allocator is present.

To get around this problem we write to arbitrary points in memory to create a buffer for out LOG call. Before we write to this buffer, we load the values onto the stack (using two stack spaces). After we are finished with this buffer, we can write those values back into those memory locations from the stack, returning it to its original state. We will arbitrarily use the addresses 0x60 and 0x80 (which are contiguous at 32 bytes) for our LOG calls.

We want to store two pieces of information:

The contract address (20 bytes).
The storage key (32 bytes).

When consuming this information we would be interested in the procedure id and the capabilities of that procedure, but that information is not available at preprocessing time. Instead, with the contract address we should be able to determine these properties later. The data is therefore a fixed length of 50 bytes. The first 20 of which is the contract address, and the next 32 of which is the storage key with which SSTORE is being called.

This is the general procedure we want to follow for each logging call (this does not cover the storage call or its protection, simply the logging):

Load the two values at memory addresses 0x60 and 0x80 onto the stack so that they can be restored later.
Store the contract address at 0x60 in memory.
Store the storage key at 0x80 in memory. This needs to be taken at runtime, so we must assume it is on top of the stack when this routine runs.
Push the designated topic onto the stack.
Push the length of the memory buffer onto the stack. While we have allocated two 32-byte slots (64 bytes), we actually only need 52, so we push 52 (0x34) onto the stack.
Push the memory location of the start of the buffer. While the start of out buffer is at 0x60, the first 12 bytes are not part of the address (the address only occupies the lower 20 bytes), therefore this address is 0x60 + 12(0xc) = 0x6c.
Restore the original memory locations.

Translated into opcodes:

-- Load the original values of our memory buffer onto the stack.
PUSH1 (pack [0x60])
MLOAD
PUSH1 (pack [0x80])
MLOAD

-- Load the contract address onto the stack, then store it at memory
-- location 0x60.
ADDRESS
PUSH1 (pack [0x60])
MSTORE

-- Take the storage key from the stack and store it at 0x80. Note that it is
-- in the 3rd position (beneath the two original memory values we just
-- loaded). Therefore we must swap it to the top of the stack. This has a
-- side effect in that it reverses the order of the two original memory
-- values. Rather than swap them back, we simply account for that later.
SWAP2
PUSH1 (pack [0x80])
MSTORE

-- Push the topic to which we publish to the stack. (NB: this is not defined
-- here).
PUSH32 topic
PUSH1 (pack [0x34])
PUSH1 (pack [0x6c])

-- Perform the LOG.
LOG1

-- Restore the original memory values. Remember that the order of these is
-- reversed by the SWAP2 used above, there we call MSTORE in the same order
-- we called MLOAD.
PUSH1 (pack [0x60])
MSTORE
PUSH1 (pack [0x80])
MSTORE

JakeOShannessy commented 6 years ago

Tests have been added to ensure that logs are properly added on chain and can be retrieved. We still need to add a proper topic. Maybe just a simple hash of a logical name.

Latrasis commented 6 years ago

@JakeOShannessy: For topic encoding, we could use a hash, ASCII or utf8. If ASCII, we get 35 symbols, which would suffice.

Possible labels:

KERNEL_SSTORE
KERNEL_CAP_WRITE

I think the former is good.

JakeOShannessy commented 6 years ago

Tests have been heavily refactored in this PR. They now rely on a module structure, which is more verbose but I've found to be much easier to work with. Many tests are split and labelled. This leads to some repetition. but makes the test output clearer.

JakeOShannessy commented 6 years ago

@Latrasis Those labels look good to me, I agree that the first is better. What are your thoughts on tiers? We could have the format ["KERNEL", "SSTORE"]. This means in the future if there are many different kernel logs, we can easily retrieve them all or just a subset (just the hierarchical topic structure). Essentially we would be replacing the underscore with an Ethereum-aware structure.

JakeOShannessy commented 6 years ago

I've pushed using a hash of KERNEL_SSTORE for now. I think we should try and merge this into master. Currently there are two failing tests due to the storage protection checker not recognising the log routines. These tests will need #12 to be addressed first in order to adequately rebuild that checker.

Is there anything else holding this one up?

JakeOShannessy commented 6 years ago

When this is merged #1 and #11 can be closed.

Latrasis commented 6 years ago

@JakeOShannessy: Format sounds good, my preference would be to use the raw ASCII instead a keccak hash for namespaces if we can.

JakeOShannessy commented 6 years ago

@Latrasis Raw ASCII makes sense for readability. I went with the hash for the following reasons, let me know what you think:

Using a keccak hash is the current standard (it's what Solidity events compile to, so the readability gain from ASCII is not huge.
Far more importantly, which ASCII? Null terminated? Left padded? The hash value is always 256 bits, so everyone knows exactly how we got it. If we just say "ASCII" we need to include more information, which might not always be clear. Keccak256 is unambiguous.

Might be worth making an issue to discuss other logging topics too.

Daohub-io / beaker-cli

Log store calls (all tests passing) #10

Logging of Storage Calls (On-Chain)