EIP 1154: Oracle interface

ethereum / EIPs

The Ethereum Improvement Proposal repository

https://eips.ethereum.org/

Creative Commons Zero v1.0 Universal

12.89k stars 5.28k forks source link

EIP 1154: Oracle interface #1161

Closed cag closed 5 years ago

cag commented 6 years ago

This is the official discussions thread for EIP #1154. The draft can be read here.

cag commented 6 years ago

So far, @Arachnid has commented on this EIP a bit in the PR. The discussion is reproduced here for convenience and expanded upon:

Can you provide example use-cases? What sort of oracles is this intended to support? Who would benefit from standardising such an interface?

The use case I had in mind originally was for answering questions about "real-world events", where each ID can be correlated with a specification of a question and its answers (so most likely for prediction markets, basically).

Both the ID and the results are intentionally unstructured so that things like time series data (via splitting the ID) and different sorts of results (like one of a few, any subset of up to 256, or some value in a range with up to 256 bits of granularity) can be represented.

Another use case could be for decision-making processes, where the results given by the oracle represent decisions made by the oracle (e.g. futarchies).

Can you expand on this in the EIP? And maybe make the title of the EIP more specific?

[x] Expand on use cases and types of oracles supported
[ ] Specify title better (I'm afraid I'm not terribly imaginative, so I will take suggestions here)

This seems to assume one particular type of oracle - one that returns exactly 32 bytes of data, and is a trusted party. There are many other types of oracle; what about them?

Regarding the trusted party factor: I've intentionally decided to start drafting the spec in as strict a manner as possible. With that said, there isn't a clear mandate about the authorization model, so it's not necessarily a single account which is authorized to make the report. Also, mechanisms like multisignature wallets, side/child chains, or something else may be used to distribute the trust if it was mandated to be a single account.

Regarding the 32 bytes of data: I am still debating and open to making the result an arbitrary-size blob of bytes. My contention is two-fold:

Putting much more than a word of result data (say like a paragraph) is probably not very actionable. Large amounts of input data is more suitable for, say, the field of machine learning, instead of smart contracts.
You can carve out an index byte in the ID if you want to pass 256 consecutive words to the handler, so it doesn't necessarily limit the result size in some sense, but at that point, I'd seriously ask whether this should even be going on the chain.

Services like Oraclize would seem to demonstrate uses for more than 32 bytes of onchain data, however.

Yes, it is true that Oraclize does support more than 32 bytes of onchain data, and this is something which I personally am not settled on as well, but I would also be interested in hearing from the community whether or not they've got any use cases for more than 32 bytes.

edmundedgar commented 6 years ago

Agree with @cag on the bytes32 thing, I know that's how Oraclize serves the data (as a string, which it's then up to you to parse), but I reckon what contract authors are usually doing as soon as they get that data is to squidge whatever they get from there back into 32 bytes so they can actually use it...

We've assumed everything is a bytes32 for Reality Check; On our current scheme (implemented in our dapp, the contract only knows somebody sent it a bytes32, it doesn't understand what's in it) this is intended to map as follows:

0 or 1 for true/false
the number for an unsigned number (optionally with decimals)
the number in twos-complement for a signed number (optionally with decimals)
0-based array for a single-choice question
1-based bitmask for multi-choice questions (all zeros means none were selected)

One hairy thing about this is that you often end up wanting to express "this question is invalid" or "I couldn't answer this question". ~~In Augur they call this "-1".~~ (This is a slightly different thing to isOutcomeSet() which I would interpret as "have you reached a conclusion about this question" - which we handle separately). The natural thing is to encode it as 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff, but that clashes with an actual -1 as a signed number...

edmundedgar commented 6 years ago

On getOutcome() and isOutcomeSet(), Reality Check calls revert() in its equivalent to getOutcome() - ours is currently called getFinalAnswer() - if the outcome isn't set. This is intended to avoid the need to call the contract twice in a transaction, although you still want to be able to call isOutcomeSet() to find out what's going on, most likely for UI purposes but possibly for use in a contract.

jleeh commented 6 years ago

I disagree with this being restricted to just bytes32 as that is a limitation with what current Oracle solutions report back with, and will have the following side-effects:

Code duplication in each of the contracts that implement this interface, having methods for typecasting bytes32 values to uint, int, bool etc.
Increased gas cost for either the Oracle reporting or the contract that uses this by forcing them to include the value conversion when calling this contract.

In my opinion, it should be the responsibility of the Oracle reporting back to send the data in the right type to begin with as-in: int, uint etc. This would result in this interface having methods for each data type, but then there's no incurred cost to either the Oracle/end-user who needs that data. Critical with decentralised Oracle projects, as on-chain value aggregation and reputation mechanisms will increase the gas cost to any user before that.

With @cag mentioning ChainLink, writing back to on-chain contracts with different value types is something it already supports with the following: int256, uint256 and bytes32. Again, it's not limited to those, it's simply what's already supported in its pre-release state.

To sum up, I don't think we should be implementing strict Oracle standards before we've really seen any established decentralised Oracle projects functioning yet. I feel it's too limiting before we've really seen what Oracles will be used for and how.

edmundedgar commented 6 years ago

@jleeh I'm all for saving gas and we jumped through all kinds of hoops to make Reality Check economical but the gas cost of these conversions is extremely small - here's a demo contract that just logs an event, one with a bytes32->uint conversion and one without:

https://pastebin.com/D55x8e9K

solc --gas gastest.sol

======= contract.sol:GasDemo ======= Gas estimation: construction: 117 + 69000 = 69117 external: convUint(bytes32): 1313 noconv(uint256): 1258

So in that case we're literally talking 55 gas, less with optimization. A simple send is 21,000 - the difference really isn't worth bothering with. The code to do this is also trivial.

MicahZoltu commented 6 years ago

In Augur they call this "-1". (This is a slightly different thing to isOutcomeSet() which I would interpret as "have you reached a conclusion about this question" - which we handle separately).

This isn't accurate, Augur doesn't represent results as a single value, it represents them as an array of values. This is because the result is a distribution of assets to token holders, and that distribution may not be 100%/0%.

Invalid is represented by an equal distribution to all parties (along with a special flag to differentiate from a valid exactly-middle resolutios.).

jleeh commented 6 years ago

@edmundedgar Good point, just tested the same for int256 with a twos compliment hex and got the same also, 55 gas difference.

se3000 commented 6 years ago

@edmundedgar one scenario we consider is something like parsing bytes32 "425.53" into a uint256. How would you handle that? You can cheaply change the type, but the content isn't preserved. Based on what you describe above, it seems like a type is specified in the request and then converted by the Oracle Handler. Chainlink does something similar and assumes the response is a single EVM word(function sig takes bytes32 but all that matters is that it's an EVM word) and then the conversion pretty much comes for free when invoking the callback function.

I'll confirm your suspicion that people are immediately parsing bytes32 into other types as soon as they arrive. We started with only bytes32 and got a lot of feedback parsing was a pain point. In our view, some conversions are cheaper than others, but they're all practically free for the Oracle Operator to compute, so it's better to be handled there. Also, contracts are complicated enough as it is, reducing the code they require to parse a response is beneficial for security and cost.

For the Oracle Operator to do this, being explicit about the type is important. We put the expected type it in the query to make it clear for all parties involved what will be reported. For example, if a Consumer checks if reportedEtherPrice > 100000, and one party is expecting a uint256 but it arrives formatted as bytes32, they will always be disappointed because even a right padded "1" would evaluate to greater than left padded 100000.

se3000 commented 6 years ago

@cag some thoughts on specification and how we talk about this stuff at Chainlink: We're all for ubiquitous language and so try to be pretty explicit about naming things. Because there's two parts to an Oracle, the on-chain and off-chain part, we specifically refer to the on-chain part as the "Oracle Contract", and have historically called the "Oracle Handler" an "Oracle Node" or "Oracle Operator". Those names might not be appropriate for a more general spec, but I think "Oracle" alone can be vague when discussing interactions.

We also refer to "Consumer" for the contract that receives the Result, and the "Requester" for the initiator in a Pull style interaction. They're often the same contract, but could be different, like if a request comes in from an Externally Owned Account. We also specify two interactions for the Pull based model, "Requesting" and "Fulfillment."

I'm with @jleeh that more concrete use cases would be helpful before standardizing, but this seems like the best place to get the conversation rolling.

edmundedgar commented 6 years ago

@se3000 We're doing what seems to be the standard Ethereum way to handle decimals which is to specify a number of decimals and deliver data multiplied by that like that - ie if you expect a USD price to a precision of 0.1 cents, you would ask the oracle for a number in milli-dollars, do everything in the contract in milli-dollars and only do the conversion to USD in the UI. I think this is also what you're advocating. As you're suggesting, it feels icky to do anything involving parsing stuff in the contract.

But the upshot of this is that when you interact with a contract, you need to know what the question specifies in terms of how it will interpret the data. If you've asked for a uint256 expecting 4 decimals, and the question actually asks the oracle to supply a uint256 using 13 decimals, you're going to have a bad day. I think this means that you can't usefully protect users of a contract by supplying data with different function signatures to distinguish times they want a bool from times they want a uint256 from times they want an int256, because they still need to look to the specific question asked to find out what kind of uint256 it is.

In other words, you need to distinguish different types of data, and it might be useful to have a common understanding of what they are, but that doesn't map cleanly to Solidity types, and often the consumer contract (as opposed to its user / UI code) won't need to know what the data type is either, so it seems simpler to deliver everything as a bytes32.

cag commented 6 years ago

There shouldn't be a conversion gas cost, as the EVM doesn't have a conception of "type" in its memory, just that the Solidity compiler enforces type semantics, and there are EVM opcodes which assume a piece of memory is typed in some way and does an operation accordingly. I think the gas difference is an accident of function selector order and maybe implementation details regarding stack memory use. For example, Remix reports that in the following version of the GasDemo, funcA is actually the more expensive function by 2 gas (probably funcA is the function selector being checked after funcB, and the temporary uint u was removed):

GasDemo Example

```solidity contract GasDemo { event MyEvent(uint256 u); function funcA(uint256 u) { emit MyEvent(u); } function funcB(bytes32 b) { emit MyEvent(uint256(b)); } } ```

Fixed-point is a popular way of dealing with numbers that may have fractional parts, and whether it is binary or decimal, and how many fractional bits/decimal places get encoded often depends on the use case (or maybe on a whim ¯\_(ツ)_/¯). Still, I would say no matter what the details of the encoding, it is more efficient than using a string to represent numerical values on the blockchain.

Gnosis' use case for oracles is limited to the "one-of-many possibilities" and "signed integer (possibly with fixed-point)" result representations listed by @edmundedgar (I am considering stealing that list for the proposal). In this use case, the ID in the proposal corresponds to an IPFS document which specifies what the oracle reports on and how the result should be interpreted.

@MicahZoltu I know that I mentioned potentially cutting out a byte from the ID to support reporting up to 256 words, but does it make sense for Augur to report values via something like this proposal?

I'm wondering if it would make sense to standardize both push and pull type oracles in this EIP. If so, the terminology for oracle should be refined, as @se3000 notes.

Still, I am shy of using the term "oracle contract", as the oracle may just be somebody with an ordinary Ethereum account. Maybe it should be "push-type oracles" and "pull-type oracles"?

I am using "oracle" in the spirit of the definition "a priest or priestess acting as a medium through whom advice or prophecy was sought from the gods in classical antiquity."

edmundedgar commented 6 years ago

@cag One thing we haven't really discussed here is how the question is formatted, except that it has a question, a type and may specify decimals - in the Gnosis context, what the content of that IPFS file looks like.

I don't know if that's too specific for the EIP which currently mainly talks about how the data is delivered, but I'd at least like to make sure that Reality Check supports something that Gnosis supports, or vice versa.

MicahZoltu commented 6 years ago

Augur's native output is an array of numbers where the length of the array is the number of possible outcomes (not including invalid, though in hindsight I think invalid should have been its own outcome) that sum up to a number the market creator chooses associated with the market. This results in each outcome receiving a fraction of the winnings proportionate to the reported number for that outcome divided by the number associated with the market. e.g., [7500, 2500] means that holders of share 0 receive 7500/10000 while holders of share 1 receive [2500/10000], or 75% and 25%.

That being said, if someone wanted to create an adapter contract that converts between Augur native output and this it wouldn't be particularly hard, assuming the denominator (10,000 in the example above) times the number of outcomes is less than 2^256. Depending on how close you get to 2^256, you may have to resort to some clever bit packing, but in theory you could compress the result down to a single 256-bit value.

cag commented 6 years ago

I'm sorry for the late response! Life's been... kinda crazy for me lately. Anyway...

@MicahZoltu Congrats on the Augur launch!

Correct me if I'm wrong, but I'm guessing that Augur results really only make sense in the context of this EIP if we're talking about a single universe right?

Additionally, in order for results to be interpreted in the way that was suggested by @edmundedgar, there would have to be an adapter which converted stuff like [0, 1] -> 1, [0, 0, 0, 1, 0, 0] -> 3, and [2500, 7500] -> 2500 (for Y/N, categorical, and scalar with four digits of granularity respectively)? I'm asking to see if this is possible, i.e. whether adding that list of suggested interpretations of the EVM word would still accommodate Augur's eligibility (outside of the scenario where a single word cannot describe the final state of the result).

@se3000 I've reread your comment and realized that there might have been a miscommunication! So in this spec, the oracle handler might map more correctly to what you are referring to as a consumer. I'm wondering if this is a fault of the terminology not being as readily apparent.

I'd like people's opinion on whether OracleHandler may be better called an OracleConsumer or something along those lines.

@edmundedgar About the IPFS file format for the description of what the oracle reports on, I personally think that it's out of scope for this EIP. Also, certain oracles (purely on-chain oracles, for example), may use a completely different ID strategy. For example, an oracle contract which reports on, say, blockchain difficulty for a certain block, may use the block number as the ID for the report.

I'd like to incorporate your list of result interpretations as suggestions in the EIP: I think there are too many ways to structure the result for that list to be considered exhaustive - e.g. consider a case where you have 4 uint64s in order to describe a value in 4D space. Maybe that's a bit far-fetched though.

One little task:

[x] Define "report" as a pair (ID, result)

edmundedgar commented 6 years ago

I like "OracleConsumer"

edmundedgar commented 6 years ago

@edmundedgar About the IPFS file format for the description of what the oracle reports on, I personally think that it's out of scope for this EIP. Also, certain oracles (purely on-chain oracles, for example), may use a completely different ID strategy. For example, an oracle contract which reports on, say, blockchain difficulty for a certain block, may use the block number as the ID for the report.

Yes, that's something we saw at the workshop. Basically nobody else is involved in structuring information (as opposed to structuring where information comes from), so in practice even if we "standardize" it nobody except us will be using the "standard", so probably better to keep it out of a process for now.

I'd like to incorporate your list of result interpretations as suggestions in the EIP: I think there are too many ways to structure the result for that list to be considered exhaustive - e.g. consider a case where you have 4 uint64s in order to describe a value in 4D space. Maybe that's a bit far-fetched though

Yes, using them as suggestions makes sense. It certainly doesn't describe all possible cases, and our system is also designed to be extensible so you're not constrained by that list. BTW we've dropped the (signed) "int" case for now because describing its "null" case is hairy, maybe just leave that out.

MicahZoltu commented 6 years ago

Additionally, in order for results to be interpreted in the way that was suggested by @edmundedgar, there would have to be an adapter which converted stuff like [0, 1] -> 1, [0, 0, 0, 1, 0, 0] -> 3, and [2500, 7500] -> 2500 (for Y/N, categorical, and scalar with four digits of granularity respectively)? I'm asking to see if this is possible, i.e. whether adding that list of suggested interpretations of the EVM word would still accommodate Augur's eligibility (outside of the scenario where a single word cannot describe the final state of the result).

According to the contracts, the following is a valid reporting array: [0, 1500, 2500, 5000, 1000]

The reporting array is simply how to divide shares up among shareholders after the market ends. An example (not yet supported in the UI) for where the above may make sense is a market for "percentage of votes by presidential candidate". Users can go long or short on any candidate at any price, and make money based on how far they were in the right direction.

cag commented 6 years ago

I got one more candidate: OracleReceiver. Here's my reasoning behind the proposal:

OracleHandler is too ambiguous: to handle does not evoke the right sense of what implementations of this interface should do.
OracleConsumer is closer, but I had a silly image of a creature eating the oracle in my mind. Joking aside, this also may suggest the producer-consumer problem. I don't believe this captures the entire space of possible implementations, as the producer-consumer problem assumes the existence of a message queue.
OracleReceiver may be bootstrapped off of the concept of receivers in information theory. This is the closest sense of what this interface should accomplish.

@edmundedgar I don't quite understand what you mean by a null case for a two's-complement representation. It's my understanding that every integer from [-2^255, 2^255-1] is one to one and onto the space of possible EVM words.

@MicahZoltu Duly noted; I've not even considered the possibility of distributions being an output! With that said, the full general case I believe should still be supported, either with clever bit-packing or with indexing the ID space with the market address and outcome index if the numerators are too large for a single word. So yeah, that "value in 4D space" remark would make sense in this case.

This may also bolster changing the result type to an arbitrary-size bytes.

edmundedgar commented 6 years ago

@cag The issue with the null case is simply that the answer to a lot of questions are either a number or "We couldn't decide" or "This question didn't make sense". We hack this with boolean, uint or multiple-selection types by denoting that the final value, 0ffff....ffff, represents "invalid", and marginally shrinking the range of numbers we can represent. But if we want to handle negative numbers, that value comes back as "-1", which is an important part of the range you'd normally want to use. So for int types we'd have to either have a different representation for "invalid" depending on the type (eg we could use the number right in the middle of the range, representing the smallest possible number in the range) or come up with a different scheme, and there are some arguments for different schemes, like the ability to set a custom range.

Alternatively we could have a different value for "invalid" separately from the result like Augur does, which is probably the correct way to do it, but this creates more complexity.

For now we just decided to drop representation of negative numbers, since probably nobody needs it, and worry about it later.

cwhinfrey commented 6 years ago

I believe there are some cases where the id will not be used. For example, if an oracle contract is created for a single event it may have no need for an id for that event. I think the design decision to include id makes sense to cover both oracle contracts that cover single events and multiple events. Would it may make sense to standardize what should be passed as the id when it is not used and a contract simply has a result? It could be as simple as "0 is be passed in for the id when the id is not applicable, otherwise the function reverts."

cag commented 6 years ago

After a few discussions with people, OracleConsumer seems to be the rough consensus for terminology for what is previously known as an OracleHandler.

@edmundedgar There is the possibility of just saying that the range of the results given by the oracle is [0, result_granularity), and the oracle consumer or user-facing application could just map that to whatever range those values should represent.

@cwhinfrey I wonder if this clause from the draft covers single-use oracles:

receiveResult MAY revert if the id or result cannot be handled by the handler.

Also, thanks for this implementation! I'll include it into the EIP draft at some point...

Ping @josojo to talk about bytes32 vs bytes as the result type, and to link in the people who want to see this draft incorporate extra data in some way.

Also, I was talking with @InfiniteStyles and he pointed out to me that bytes currently do not have a native Solidity deserialization method (https://github.com/ethereum/solidity/issues/3876). However, this is actively being addressed: https://github.com/ethereum/solidity/pull/4390

[x] Change oracle handler to oracle consumer
[x] Put tidbit into Implementations section

josojo commented 6 years ago

Web3 oracle workshop summary - 23/24 of July

During the oracle workshop organized by the web3 foundation in London, we also discussed this EIP. Participants of the discussion were representatives from oralize.it, realitykeys, chainlink, appliedblockchain, thomsenreuters, consensys and web3 foundation.

We agreed that such a standard would be beneficial and we should introduce one. However, it seems unlikely that we can find one standard that fits all use cases.

We agreed that the proposed standard

interface OracleHandler {
    function receiveResult(bytes32 id, bytes32 result) external;
}

is sufficient in most cases and a very efficient standard. However, if larger data needs to send to the OracleHandler, it might be more convenient (and gas efficient) to provide the result as a bytes variable and not as a bytes32 variable. Also, metadata might be required for some oracle solutions. E.g. oraclize.it provides also authenticity proofs. For these cases, we are proposing a second function:

interface OracleHandler {
    function receiveResult(bytes32 id, bytes result, bytes metadata) external;
}

The metadata could be handed in as a part of the results, but it will be cheaper gas wise to get them with a second variable than parsing the result every time. However, if metadata is not required, the costs for calling this function are higher with this additional metadata parameter.

This second proposal has the benefit that this interface is more flexible and overcomes many restrictions of the first proposal. Hence, the second proposal is more inclusive and forward-compatible than the bytes32 solution. The additional gas costs for the second proposal should be of the magnitude of only some 100 gas.

The consensus of the workshop was that the standard should support both methods. The bytes32 result solution was appreciated for its leanness. Other additional data as authenticity proofs could be checked in this setup by other contracts preprocessing the oracle data and only calling the oracleHandler after successful preprocessing. The 2nd proposed solution was appreciated as it is the most inclusive solution. It shines with flexibility and future-compatibility.

The participants also agreed that we need push and pull oracle interface standards, as both methods have unique selling points. The upper definitions are obviously only push interfaces. For pull interface, we agreed to the proposed standard. Additionally, returning bytes instead of bytes32 might be helpful as well:

interface Oracle {
    function resultFor(bytes32 id) external view returns (bytes32 result);
}

interface Oracle {
    function resultFor(bytes32 id) external view returns (bytes result);
}

We also mentioned that declaring these functions as view might be a restriction, which is not valid for all use cases.

cag commented 6 years ago

However, if larger data needs to send to the OracleHandler, it might be more convenient (and gas efficient) to provide the result as a bytes variable and not as a bytes32 variable.

I am fine with changing the type of the result to bytes contingent on decode getting implemented in Solidity. I don't expect developers to roll their own encoding/decoding functionality every time they want to implement this standard though, so last call for this EIP is postponed until at least then.

[x] Change type of result to bytes

[It] will be cheaper gas wise to get [the metadata] with a second variable than parsing the result every time. However, if metadata is not required, the costs for calling this function are higher with this additional metadata parameter.

This is probably the main trade-off for adding the additional metadata parameter. In this case, is the savings from having to do something like data, metadata = decode(result, (bytes, bytes)) worth defining the extra parameter?

As noted earlier, this is already a tradeoff from just directly interpreting a bytes32 as a uint, but while I can see the use case for that tradeoff, adding the additional metadata parameter seems like it is adding another parameter which most people will ignore, and as is noted:

The metadata could be handed in as a part of the results

Also pull oracles would have to be modified to either have an additional function metadataFor(bytes32 id) external view returns (bytes metadata) or to have resultFor return a pair of bytes if both they and this proposal get incorporated into this standard.

Ping @D-Nice to discuss this more.

Also, I want to note that single-use oracles, like what @cwhinfrey proposed earlier in this thread, already would ignore the id parameter, so taking the previous argument further might mean that oracle consumers should just expect to receiveReport(bytes report) from an oracle, and then the consumer would just decide whether or not to accept the report and what to do with it. This may complicate matters for pull oracles though (like how would a specific report be selected?).

It seems like there is a lot of demand for pull oracles.

One of my concerns in potentially adding pull oracles to the standard is the possibility of fragmenting the ecosystem for standard oracle consumers. What this means, practically speaking, is that oracle consumers may have to support both waiting to hear the result from an oracle and reaching out to an oracle and pulling the result from the oracle.

If we move the burden of implementation onto the oracle, then we'd have to pick a single interface to standardize around. If the push interface is chosen, then all pull oracles would also have to have some sort of function reportTo(OracleConsumer recipient, bytes32 id), but if a pull interface is chosen, then the oracle has to be a contract (no more ordinary accounts being oracles).

In any case, it's possible to specify both to some level of mandatory-ness, but then, there are a few scenarios:

Oracle consumer expected to handle both
- ✔ Less requirements for oracles
- ✘ More requirements for oracle consumers
Oracle consumer can handle either
- ✔ Less requirements for everyone
- ✘ Oracle and oracle consumer compatibility depends on both sharing at least one mode of communication
Oracles must at least be push
- ✔ Ordinary account support
- ✔ At least one guaranteed interface for oracle consumers
- ✘ More requirements for standard pull oracles
- ✘ Doesn't feel as natural for certain pull oracles
Oracles must at least be pull
- ✔ At least one guaranteed interface for oracle consumers
- ✘ More requirements for standard push oracles
- ✘ No ordinary accounts can be standard oracles

In some of the pros and cons listed above, standard is emphasized because I anticipate the use of this standard would very much include adapters for existing oracles, and for those oracles, another contract containing code to adapt the existing oracle to either a mandatory pull or push interface would have to be deployed anyway.

Anyway, if there is enough demand for also including the pull interface into the standard, my vote is for scenario (3).

We also mentioned that declaring these functions as view might be a restriction, which is not valid for all use cases.

Curious as to what those use cases may be...

se3000 commented 6 years ago

@josojo thanks for the summarizing. I agree with @cag on the metadata point. I actually thought that the majority at the meeting were in favor of leaving it off, although there were certainly a couple in favor of it. We discussed the bytes being a flexible type that could be decoded manually for now, and possibly automatically if abi.decode lands in solidity v0.5. Given its flexibility, metadata could easily be included in the result parameter.

As discussed, I put together some numbers around gas costs for including the extra parameter as opposed to parsing it. Full disclosure: I've seen some fuzziness in truffle gas estimations in the past, but it has been off on the order of 10s, not 100s. (PRs welcome!) Based on those examples, it looks like including an extra bytes type metadata parameter is actually roughly +600 gas per request. Alternatively, parsing a single bytes array into two bytes arrays is ~300 gas.

Not only is parsing bytes cheaper, but I think it is more fair to the users of the standard. A required metadata parameter would raise the cost of using the standard for all that do not need it, as opposed to putting the (lesser) cost on those that choose to use it.

D-Nice commented 6 years ago

Regarding the workshop, I do not think it is fair to assume the majority were against it. I only noted one party against it, two for it (Oraclize being one of those), and the other two parties either indifferent or abstaining from it, albeit one of the abstaining has a definite usecase for the metadata as well, at least if I followed their presentation correctly.

@se3000 thanks for working on the gas cost tests and bringing them forward. I have expanded on your work with some more comprehensive test cases, and slightly more realistic results/metadata, in most cases I think we can agree that the elements won't be a single byte for both. The link to them is here: https://github.com/D-Nice/gas-tests

To give a quick overview, it replaced the single byte result + metadata assumption, into two identical IPFS multihashes (most Oraclize proof bytes are IPFS multihashes, and it was convenient to use as the result for something greater than a single word size). From our expanded upon tests, even with the most ideal and unrealistic parsing solution, where we expect a consistently formatted array of 2 elements with 2 word sizes, the parsing costs ~300 more gas than just using results + metadata (proper non-naiive parsing would of course amplify this much more). The increased cost for non-metadata utilizing Oralces is more consistent at 600 gas, however, they also have the option of using the bytes32 single element variable as an alternative, while metadata Oracles have no alternative, hence us requesting 2 bytes.

You can check the tests for more specificity, but at no point do metadata Oracles gain any competitive advantage over their non-metadata counter parts. In fact, we'd be putting ourselves at a disadvantage by going with the single bytes solution as it would cost ~3000 more gas for us than non-metadata Oracles (this doesn't even account for the additional computations that may be needed to handle or store certain proof types), whilst the result + metadata solution saves us ~300 gas at least and non-metadata Oracles will still be more efficient by ~2000 gas even with this implementation. If a compromise can't be made over this, then maybe our groups are not ready for a standard.

I will look to list the Pros and Cons I see of the result + metadata solution.

PROS

more efficient for metadata Oracles utilizing it (metadata Oracles as is will have higher on-chain costs to use due to this feature, thereby non-metadata Oracles still retain an on-chain efficiency advantage).
allows for results to be consistent across any Oracle type (if every Oracle provides their result completely differently, there is no point in a formal standard, and someone might as well just come up with a wrapper to try uniting production Oracles when they need them).
allows for extensibility and future proofing for ABI decoding and other features (non-metadata Oracles may eventually find some useful on-chain verification mechanism to include in the metadata, encoded ABI bytes could be defined in the metadata parameter, signatures for the included data can be included).

CONS

sunken cost for non-metadata Oracles (they do have an alternative with utilizing the even more efficient bytes32 alternative if that is a worry).
awkward unutilized field for certain non-metadata Oracles

Of course, please do assume I am being biased, and feel free to populate the PROS and CONS as you may see them.

cag commented 6 years ago

I tried golf-gas-testing the proposed metadata addition. I wrote the following test contracts:

https://gist.github.com/cag/ca2b2046c75bd1b001e45c16e3890226

I also used the following test data:

Input type	Test contents
Single word result	0x6c65726e65726372616d626f73616e646c61726b6f6662616279626f6f6d6572
Dynamic result (3 words/96B)	0x0000000000000000000000000000000000000000000000000000000000000060 64656361676f6e73756e617573706963696f7573636f7272656374696f6e636f77796f7574736d617274696e67726569737375696e6762726f6f6d6c696b65646164726f636b646976696e6973656861737479686561726b656e656468696e74
Single word meta	0xc9f5e4290844e94101ba068b4858d916e0f10cb7cf1fbbe4f3a1f286fed0a9da
Dynamic metadata (rsv-length/65B)	0x0000000000000000000000000000000000000000000000000000000000000041 ebb1ee12c4c189b3ea9cc3c6ef6e0a07772e20518045ba5a1c61c280897e51e71f2cd48ec806af44f509872db983b6d4f59af61ca55534ee155ee5b246af6d37aa

This led to the following results:

Scenario	Single param TX cost	Double param TX cost	Winner	Delta
Single word result, no metadata	25165	25555	Single	390
Single word result, single word metadata	27654	28108	Single	454
Single word result, dynamic metadata	31902	31309	Double	593
Dynamic result, no metadata	30598	30988	Single	390
Dynamic result, single word metadata	33785	33550	Double	235
Dynamic result, dynamic metadata	37870	36742	Double	1128

Some of the contracts I've written can probably be golfed more, but yeah, check it out.

se3000 commented 6 years ago

@D-Nice it looks like the main the difference you're seeing when comparing the price with metadata to the price without metadata is due to the gas pricing of Ethereum, not the difference in proposed interfaces. Typically in Ethereum usage costs more, and that's always the case when sending data in a transaction. Changing that is probably outside the scope of this PR, so I think it'd be more insightful to stick to 1:1 comparisons.

[Pro: metadata] allows for results to be consistent across any Oracle type

Metadata is inconsistent and oracle specific. I think this would make responses less consistent then if metadata was handled before reporting the result to the oracle. More on this later.

[Pro: metadata] allows for extensibility and future proofing for ABI decoding and other features

How so? There would be another field. What about that is more or less future proof for ABI decoding? My example already decodes 2 bytes arrays, and additional optional fields could easily be added.

[Con: metadata] is a sunken cost for non-metadata Oracles

This is not a sunk cost. It is an ongoing cost that is placed on non-metadata oracles on every request they send. If anything it seems more like an externalized cost.

they do have an alternative with utilizing the even more efficient bytes32 alternative

The method we're discussing is being added to handle requests that need more than 32 bytes of data. This alternative will not work.

@cag thanks for checking the numbers, very insightful. I'd previously assumed that it was a fixed cost because I'm only assigning pointers to preallocated data, but it looks like there's a gas cost in there that I'm missing and sending more data raises the parsing cost. I dug into "Dynamic result, dynamic metadata" example and couldn't help but golf, because who doesn't love a good optimization? I wrote an alternate version that moves pointers, the parsing cost is about half of your example. Notably, as more data is passed in the cost grows more quickly when allocating new memory than when moving pointers. Doubling the length of both inputs consumes ~1200 more with new memory, as opposed to ~300 more with pointers.

Gas details aside, I continue to believe that the cost of optional fields should be put on the people using the fields, not externalized on the non-users.

se3000 commented 6 years ago

Taking a step back, there was a question only briefly touched on at the workshop, which I think it'd be helpful to hear from the wider community: should an Oracle send data that it can see is false? It seems to me that it should be the Oracle's responsibility to ensure the data they are sending is correct, so if there is a proof to run on-chain, they should verify it before passing along data.

If the Oracle doesn't process the metadata, and pushes that work on to the consumer, then the Consumer loses the interoperability that this standard aims to achieve. Hypothetically, Chainlink decides to use metadata and sends the m-of-n ratio of oracles reported over requested. A Consumer wants to use both Chainlink and Oraclize interchangeably. They now have to deploy both Oraclize metadata checking logic and Chainlink metadata checking logic. Apart from the inefficient duplication of contract code in every Consumer Contract, what happens if a third metadata oracle shows up? The Consumer Contract has to be redeployed, or they can't use the new oracle. Same thing if one oracle updates their metadata format/verification. Obviously redeployment is non-trivial for many dapps. Also, how does the consumer differentiate which type of metadata they're receiving? Maybe we should add a metametadata field? 😉

Metadata seems intrinsically proprietary, or at the very least non-standardized. For this reason, it seems like metadata should be handled by the oracle before reaching the consumer, or by some higher level on-chain proxy. Consumers should only have to concern themselves with receiving the data and handling it, otherwise they lose interoperability and become dependent on a limited number of parties.

cag commented 6 years ago

@se3000 Thanks for the optimization! I see that it uses the same implementation strategy as what you've originally posted (my derp). I'll update the table accordingly...

[x] Update table with new measurements using more efficient implementations.

With that said, it seems the dominant gas cost comes from the size of the data itself, and not from the parsing of said data.

If the Oracle doesn't process the metadata, and pushes that work on to the consumer, then the Consumer loses the interoperability that this standard aims to achieve.

Arguably, this interoperability does not exist in the first place, as the result format isn't specified.

should an Oracle send data that it can see is false?

Right now, the spec indicates:

receiveResult MUST revert if receiveResult has been called with the same id before.

In line with this requirement, I vote that any adapters for systems using proof metadata should verify the proof before making a report.

That said, I do see a potential use case for removing the aforementioned requirement from the spec. What if the consumer needs to punish entities making wrong reports about a subject? That's a case in which the consumer would have to receive the proof metadata as well as report on the same id multiple times.

Still, I also don't see mechanisms for ensuring data quality (e.g. punitive measures) as the oracle consumer's responsibility, but rather a topic which the oracle should address.

In general, any oracle system which requires additional on-chain processing of metadata should justify the inclusion of that cost by simply producing better data. Oracle consumers will want the best data, and the gas cost differences between parsing one or two bytes parameters will pale in comparison with the gas cost of processing proof metadata and its value proposition of delivering better data.

Anyway, I'm still not sold on adding the metadata parameter.

cag commented 6 years ago

@se3000 A small note about your optimization:

contract DynamicDynMetaConsumer {
    event LogStuff(bytes b, bytes m);

    function receiveResult(bytes res) external {
        bytes memory resCopy = res; // <- See this line
        bytes memory b;
        bytes memory m;
        assembly {
          b := add(resCopy, 0x20)
          m := add(add(b, 0x20), mload(b))
        }
        emit LogStuff(b, m);
    }
}

The latest version of Solidity does not automatically copy dynamic types from calldata to memory for a given symbol, instead leaving this up to the contract writer to do so. Still, pointer moving after the data has been copied is the most efficient way I've seen for handling this case.

I'd also like to remark that the single word cases for both results and metadata readily generalize to fixed format cases, in that their efficiency only relies on the fact that the memory layout in calldata is completely known up front.

se3000 commented 6 years ago

I'd also like to remark that the single word cases for both results and metadata readily generalize to fixed format cases, in that their efficiency only relies on the fact that the memory layout in calldata is completely known up front.

Great point. In the workshop, I believe that we pretty much unanimously agreed that the format is determined by the request, and that format is fixed for the response after the request. This pretty much lines up with fixed format, and I'd propose we stick to fixed format for the discussion at the moment. I'm open to non-fixed format, but don't see a burning need yet. And, it may be easier to sort out once we've nailed down some of the other moving pieces.

Amxx commented 5 years ago

Hello, I'm a bit late but I recently discovered this EIP and got an interest for it. I'm basically building an Oracle system which aims at providing the result of offchain computation to the blockchain. There is a big protocole to ensure correctness of the results. I want to make it EIP 1154 compliant :)

At the end of the protocole, a finalize function is called . This the point where the result becomes available (through resultFor). It is also the point where we can do a callback (if the user requested it) using the receiveResult API. To avoid an user being able to deny a finalize, I have to make sure that the call to receiveResult doesn't revert the finalize under any circonstances.

my code is as such:

function finalize(...) {
    // ...
    if (callbackTarget != address(0))
    {
        /**
         * Call does not revert if the target smart contract is incompatible or reverts
         *
         * ATTENTION!
         * This call is dangerous and target smart contract can charge the stack.
         * Assume invalid state after the call.
         * See: https://solidity.readthedocs.io/en/develop/types.html#members-of-addresses
         *
         * TODO: gas provided?
         */
        require(gasleft() > 100000);
        callbackTarget.call.gas(100000)(abi.encodeWithSignature(
            "receiveResult(bytes32,bytes)",
            _taskid,
            _results
        ));
    }
}

The 100000gas value has been set arbitrarily. I was wondering if we should enforce an upper bound on the gas consumed by this function. Without such a bound my application would really struggle :/

cag commented 5 years ago

@Amxx I would say that an upper gas bound might not be a bad idea.

Regarding this EIP draft, I've been increasingly questioning whether is fundamentally any value in making this EIP. The issue is that any value in the standard gained would have to be gained through some form of interoperability, yet this EIP doesn't actually provide said interoperability.

The only things this EIP actually specifies is that there is a function which may be used (resultFor) to receive a result for something identified by an ID word. I'll reiterate what @edmundedgar said earlier in this thread about standardizing the structure of result:

...Basically nobody else is involved in structuring information (as opposed to structuring where information comes from), so in practice even if we "standardize" it nobody except us will be using the "standard", so probably better to keep it out of a process for now.

This opaque result parameter whose structure we can't specify, but whose value comes in part from knowing its structure... if we continue further along trying to implement this, we may reinvent something like the Ethereum ABI codec. This, paired with some bytes used to identify this information...

If the consequence of this EIP would be that adapters between systems which implement this have to be created anyway, then we might as well go to plain function calls. It seems wasteful to specify something only to force proprietary interpretions of data on oracles anyway, and have silly things like putting ABI encoded stuff in data only to unwrap it and reinterpret it before putting it in a different format. At that point, just have the proprietary endpoint and use plain function calls.

Anyway, I feel now that opening this was a mistake, so if there are no objections, let's close this EIP. I dunno if there is a sort of last call procedure for something like this though.

Amxx commented 5 years ago

As an implementer of this EIP I really don't think this was a mistake. I might be the only one providing this interface but I'll continue doing it.

It's the end user job to know which data type he asked for, and to decode the resulting bytes accordingly !