OmniLayer / spec

Omni Protocol Specification (formerly Mastercoin)
The Unlicense
341 stars 118 forks source link

Capturing and distributing application-level data off of the blockchain #250

Open marv-engine opened 10 years ago

marv-engine commented 10 years ago

Introduction

This is a proposal for a framework to remove variable length strings from transaction messages, for instance tx 50, tx 51, tx 54, and store those strings securely in a database that's accessible by applications such as Omniwallet. Almost by definition, these strings and similar data are intended for consumption by human users of the applications, but these data items have no direct bearing on the validity of transactions themselves. So, it makes sense to capture and store the application-level data at the application level.

Description

Rather than storing multiple variable length strings (and other large data items) in the blockchain, applications such as Omniwallet would store the data in JSON objects within their own local persistent storage, e.g. an application-level database. The application would have one or more associated web servers that can deliver the JSON object in response to a request based on an identifier such as property ID. These requests would likely come from other instances of the application, so that those instances can have a local copy of the JSON data, thus eliminating a single point of failure. This mechanism would distribute this data organically among instances of the application, in a timely fashion.

Notification of new/updated data is pushed to applications via transactions in the blockchain; applications then pull the data via RPC requests, parse it and store it locally for their users.

Sample Flow

The flow in Omniwallet for transactions 50, 51, 54 would be something like:

  1. The user fills in the form fields for all the data currently transmitted in the transaction message, most of which would be stored in the JSON object instead. That includes the existing variable length strings in the tx message, plus for instance the URL for an image to be associated with the new smart property.
  2. When the user submits the form, the Omniwallet frontend constructs a JSON object from the data, secures and signs it with the private key of the sending address, and the JSON object is stored in the Omniwallet database. The tx message to be published is constructed with all the string fields empty, except for the Property URL field. Property URL would contain the URL of the application's web server so other instances of Omniwallet (or any application that chooses to participate) would know where to retrieve the JSON object for that smart property. (Note: Omniwallet can give the user the option to send the tx message the way it's constructed now.)
  3. When that transaction is processed by another instance of MasterCore and then stored in a related instance of the Omniwallet database, that instance will extract the Property URL and send an RPC to the originating server requesting the JSON object for the smart property. That instance will then verify the authenticity of the JSON object and store the data locally so it's readily available to that instance's users.
  4. If the original user signs in to another instance of Omniwallet and updates the application-level data for that same smart property, the whole process repeats but with that other instance as the source of the JSON object that represents the update.

    Flexibility

Applications can choose to store and share other application-level data using this same framework. All that's required is that other applications know how to parse the JSON object and use its contents. For instance, an application could let a user tag or annotate items such as transaction hashes. This data could be inserted into an object similar to a BTC transaction message (or some other format) with desired recipient addresses so it is available only to authorized consumers (or everyone if specified). The MP spec would need a new transaction that publishes any item identifier and the URL to retrieve the associated JSON object. It would probably be useful also to have an API to get all the JSON objects for a specified identifier - to have the history.

In essence, the collection of JSON objects could be considered an application-level sidechain.

dexX7 commented 10 years ago

This is an awesome proposal! - Especially because it's backwards-compatible. I have a few notes:

1) Integrity:

While "signed by sender" is a great idea and guarantees that a message is indeed from the sender, it doesn't provide any indication, if it's the same message as initially provided, which might be relevant or required for some applications - think of a contract for example. It's not given that sender or host act honest in all cases (intentionally or unintentionally).

Providing a hash of the message is probably the most reasonable attempt, but if data is not immutable, a different hash would be the result of any change. So how about two sections? One with never-to-be-changed information which is used to generate the hash and one that can be changed by the issuer? To go on with the contract example, say there is a company which puts a contract in the immutable section, but "public announcements" (or whatever) in the other.

If the data is indeed immutable right from the beginning and any change strictly comes with an explicit update transaction, then hashing the whole data is of course the way to go in my opinion. This would be especially helpful, if data is available at different sources and guarantees every source provides the same data without any doubt.

2) Data availability:

I think providing the data source in form of an url is rather limiting. What if I fetch a transaction, extract it's info url and the original content is no longer available? You mentioned mirroring the data and several databases, but one url can only refer to one source, while in reality there might be a bunch of "info-data-providers" which hold the information, even if the original source is no longer available.

What if, instead of an url, a data-identifier is provided? In the case of Omni it's straight forward: Omniwallet stores the data, provides an identifier and knows how to retrieve the data for a given identifier. But now it would also be possible for other "info-data-providers" to store this data and associate it with it's identifier.

3) = 1 + 2 :)

Since hashing the same message always produces the same hash and hashing a slightly different message always produces a very different hash, the hash itself seems to be the perfect candidate for the role as data-identifier where message and identifier are strictly bound.

In the case of Omniwallet it's again very straight forward, because Omni actually doesn't need to be told where the data can be found in the first place, but a user might as well query another-data-provider-X and can be sure the result is equally the same as querying an Omniwallet server.

There might be one edge-case, but a very easy solution as well: the case where a similar message is associated with different transactions. - Each would yield the same identifier due to the similarity of the message. If identifiers are associated with messages, then this is fine - if identifiers should be unambiguously associated with messages and transactions, then the message could be extended to include transaction inputs as well. Those are known at the moment of transaction construction and it's possible to prepare the transaction, extend the message, hash the message and put the hash into the transaction as reference.

And now that I think of it, this actually also removes the need of signing the message by the sender, because the message is already associated with a transaction and the transaction is guaranteed to be signed by the sender.


For the record: If this goes live, I'll hereby commit to setup at least three independent servers for at least two years which continuously watch and store messages, so such data is preserved and available.

I will also try to provide this data in a distributed way, e.g. via BitTorrent - and the beauty: magnet-links are (usually) nothing else than a resource identifier which is a hash of resource itself. :)

marv-engine commented 10 years ago

@dexX7 Thanks for the positive feedback. A few quick responses -

If the data is indeed immutable right from the beginning and any change strictly comes with an explicit update transaction, then hashing the whole data is of course the way to go in my opinion. This would be especially helpful, if data is available at different sources and guarantees every source provides the same data without any doubt.

I had envisioned that explicit update transactions would be the only way the application level data was changed.

I think providing the data source in form of an url is rather limiting. What if I fetch a transaction, extract it's info url and the original content is no longer available? You mentioned mirroring the data and several databases, but one url can only refer to one source, while in reality there might be a bunch of "info-data-providers" which hold the information, even if the original source is no longer available.

Are you referring to the server at the URL no longer being available? Is that in real-time as the transaction is confirmed & processed (~10 minutes after submission), or if the blockchain is read retrospectively? We do need to prevent a single point of failure in real-time. One application-level object could be a list of mirror sites for a URL (meta-application-level data). Retrospective access may be a different situation, I think. I don't know what's necessary there. But if there are mirrors that can provide historical data for the object then that should be covered.

What if, instead of an url, a data-identifier is provided? In the case of Omni it's straight forward: Omniwallet stores the data, provides an identifier and knows how to retrieve the data for a given identifier. But now it would also be possible for other "info-data-providers" to store this data and associate it with it's identifier.

Can you elaborate how this identifier indicates where the data is available?

dexX7 commented 10 years ago

I had envisioned that explicit update transactions would be the only way the application level data was changed.

Awesome.

Are you referring to the server at the URL no longer being available? Is that in real-time as the transaction is confirmed & processed (~10 minutes after submission), or if the blockchain is read retrospectively?

Maybe we slightly think in two different directions at this point, since you mentioned an applicaiton level several times. My understanding and vision: a user broadcasts a transaction which includes some kind of reference (shall that be a link or a hash) which associates external information with this transaction. You then mentioned Omniwallet as host for such data.

So what I was thinking about was to not limit this to Omniwallet, but rather as a more generalized approach where Omniwallet might be one of several "data-providers", thus my response why I consider static urls as rather unhandy and this holds true for Omniwallet as only data-source as well, since internal link structures might change over time, too. Long story short: yes, I was also explicitly thinking about retrospective retrieval.

But if there are mirrors that can provide historical data for the object then that should be covered.

This seems to disconnect the inital transaction from the message with the need of an intermediate such as Omniwallet as some kind of data-redirector to serve as connection between "outdated links" and "messages".

Can you elaborate how this identifier indicates where the data is available?

Depending in which direction this goes, the "where" probably starts with Omniwallet as main data source while over time other data-providers and mirrors might emerge.

I can go a bit further into detail later, but maybe a good analogy is a Bitcoin transaction - there is a transaction (= some kind of data/message) which can be unambiguously identified by a transaction id - which is actually a hashed value of the raw transaction and directly coupled with the transaction itself. At any time I could take a transaction and derive it's identifier, solely based on the content of the transaction, so to speak. And if there were explorers which provide a way to retrieve the data associated with this transaction id, I might as well retrieve the transaction after some time, but wouldn't really bother where it's coming from, since I can be sure, it's exactly the transaction associated with this id.

hash

This picture is from http://onlinemd5.com/ and should only serve as example.

You pretty much brought it to the point with your question. I suggest to store a data identifier in the chain. This is not necessarily conflicting with providing a location nor a complete solution on it's own, but my main concern is a situation where after some time a lot of dead links and references are on the chain with almost no benefit -- because in this case storing the data in the chain might be skipped completely and users could simply sign some messages and post them on a website as well.

urbien commented 10 years ago

@dexX7 I like your idea of using hash instead of a URL, suggested by @marv-engine. And it does not need to be a complete opposite to using a URL. You referred to magnet links, they indeed use a hash as their main parameter, but also allow to specify a suggested source of data via a URL. This way both requirements can be satisfied. It is a more difficult path than just using the URL, as some discovery/search for a source of data will need to be devised, should the URL point to a dead server over time. But it is not a problem that needs to be solved immediately.

dexX7 commented 10 years ago

Yea, the main goal is to decouple the message from a static source while binding the content of the message strictly to it's identifier. Torrent or rather the mainline DHT is the most notable example related to closing the bridge between peer, identifier and source, but also just a very sophisticated application of the base concept.

Since it sounded to me as if the plan was to build transactions via Omniwallet while Omniwallet also has the role as data-provider, the whole step of locating a source for the data can be pretty much skipped.

And you mentioned already that providing an identifier and url is not necessarily a conflict.

Now what's a bit tricky is to consider the size requirements as well and keep a healthy balance. To bring up some numbers of the length of different prominent hash functions:

It should be avoided that the - let's call it "meta-data reference" - requires more space than the meta-data itself.

The upside: there is probably no need for a very lengthy and more collision resistent hashing algorithmn, if potential collisions may not be a problem at all - due to the additional relationship between the message and a (Bitcoin) transaction.

What's really cool about all this, and completely unrelated to how data is referenced, @marv-engine already called it and mentioned the available fields that can be used in a crowdsale transaction, other transaction types could easily be enhanced as well, because adding "junk" to a Mastercoin transaction does not invalidate it. (@faizkhan00: IIRC you asked once, why I consider the ability of adding junk as important: this is one great example.. :)

Say for example I create a Simple Send which usually looks like this:

Vout #0: pay-to-pubkey-hash: Exodus
Vout #1: pay-to-multisig: Transaction data
Vout #2: pay-to-pubkey-hash: Receiver
Vout #3: pay-to-pubkey-hash: Change

I could as well enhance the transaction with meta-data by simply adding another output, e.g.:

Vout #0: pay-to-pubkey-hash: Exodus
Vout #1: pay-to-multisig: Transaction data
Vout #2: pay-to-pubkey-hash: Receiver
Vout #3: pay-to-pubkey-hash: Change
Vout #4: op_return: Meta-data reference (url, hash, ...)

The point: using the available fields or extending transactions does not require a new schema and is fully backwards-compatible - and clients which don't care about meta-data can simply ignore it, because it's not consensus critical.

ripper234 commented 10 years ago

I am concerned that this will take a data storage system that is completely decentralized and doesn't depend on anything other than the Bitcoin blockchain, to something that is more fragile. I see some interesting suggestions above (e.g. Magent links), but any solution that involves them needs to analyze the motivation people have to running such nodes. It's a big change that I'm hesitant to push for ...

Before continuing the analysis, can we stop and discuss why is this even needed?

Stuffing strings into the blockchain is only done on infrequent operations such as asset creation, not on every trade. Why are we quick to optimize the infrequent operations?

dexX7 commented 10 years ago

Stuffing strings into the blockchain is only done on infrequent operations such as asset creation.

The AssetIssuanceStandard was created months ago, yet an option to create smart properties with one of the wallets was only very recently introduced. I think it's too early to make predictions in this context.

It's a big change that I'm hesitant to push for ...

Sorry, I think I have the tendency to drift into the extreme quite fast which might have created this impression. To my understanding Marv simply suggested to enhance Omniwallet by providing a space for external meta-data which is referenced by the url fields that are already available.

Even though an P2P overnet - in the best case even somewhat reliable - is certainly something that would be awesome, I didn't intend to imply this is the next step, but rather switching from an url scheme to something more general, so the data is not bound to a specific data source, but, say for example, data that can be identified in an asset dictionary which is available on Omniwallet and Masterchest, similar to what CoinPrism provides.

urbien commented 10 years ago

@ripper234 my interest in this discussion is from the point of view of app transactions riding on the chain, if you wish, SaaS on chain. You know, the usual - orders, invoices, shipments, products, complaints, inspections, inventory, etc. etc. So in my case it is not the non-essential meta-data, it is the meat. I am looking for the best way to achieve that by keeping consensus on a blockchain, but not bloating it with the app data. MaidSafe DHT might be a way, would appreciate the pointers and who would be the best to discuss this with.

dexX7 commented 10 years ago

@urbien: do you need block chain security for a case such as "I have a document here and I claim this is the same one as I provided a few months ago" or guaranteed availability at any given time?

According to MaidSafe's wiki it's DHT has beta status, but that's unfortunately all I know about it at this point. I'm curious nevertheless. The Google group is probably a good place to establish contact.

urbien commented 10 years ago

@dexX7 yes, blockchain will provide a proof of existence and the way to rollback the transactions, should the blockchain consensus decide to switch to a fork. It will work like the log-based databases - every time app object changes, the hash is posted to the blockchain and the new object is written to the DHT with a new key, which is that new hash (old value with the old key is preserved in DHT). Thus DHT contains all previous versions and so is the blockchain, making rollback trivial. This of course requires DHT to provide availability, security (by dispersing even individual object data across miners/farmers) and a huge scale. Thx for the pointer to the newsgroup, will post there. If you are interested, I wrote an on-chain apps white paper. As far as I know no one is working on such a DApp yet (pleas let me know if anyone does), and my team is exploring how Mastercoin, and possibly MaidSafe can help us. This is how I spotted this github issue. Hope this explanation helps Masterminds decide on the metadata riding on- or off-chain.

My unanswered question, if I may ask it here, how transactions like the creation of a new order/invoice, or their updates fall into the Mastercoin protocol. They are not exactly financial transactions, nor they are exchanging value. May be you can point me to the right people.

marv-engine commented 10 years ago

@dexX7 @ripper234 @urbien I'd like to have a live discussion on this topic, with other Mastercoin devs as well. Are you available today (Tuesday) or Thursday at 1900 GMT (3pm Eastern Time)? @urbien I can send you an invitation to the Mastercoin virtual office in Sococo if you send an email to marv@mastercoin.org

ripper234 commented 10 years ago

I am available today, possibly on Thursday too but not sure about that yet.

dacoinminster commented 10 years ago

I'd like to be in on this conversation too. Should be online at that time. Thanks!

On Tue, Sep 16, 2014 at 8:21 AM, Ron Gross notifications@github.com wrote:

I am available today, possibly on Thursday too but not sure about that yet.

— Reply to this email directly or view it on GitHub https://github.com/mastercoin-MSC/spec/issues/250#issuecomment-55759785.

urbien commented 10 years ago

I can do today 3pm or Thursday any time EST from early morning to later evening. Sent my email address to @marv-engine pls let me know how to join on Sococo

dacoinminster commented 10 years ago

3pm GMT is 15 mins from now. Please get sococo: https://www.sococo.com/

Once you get it, we can send you a link (don't want to post it publicly on github though - use PM)

We may be a bit late, as another meeting is still going right now. Thanks.

On Tue, Sep 16, 2014 at 11:35 AM, Gene Vayngrib notifications@github.com wrote:

I can do today 3pm or Thursday any time EST from early morning to later evening. Sent my email address to @marv-engine https://github.com/marv-engine pls let me know how to join on Sococo

— Reply to this email directly or view it on GitHub https://github.com/mastercoin-MSC/spec/issues/250#issuecomment-55791534.

dexX7 commented 10 years ago

Sococo crashes for me right after the start on two machines. Mailed the support a few minutes ago..

The only thing I'd like to add to the discussion above:

There is also a "market" for documenting services, such as:

http://www.proofofexistence.com/ http://coinspark.org/ (they use our assets standard) https://blocksign.com/

m21 commented 10 years ago

Gotta play the devil's advocate here. :)

1st Q: Why ? Why move some text strings from a distributed protocol to a central database? 2nd Q: this seems to be a local data storage issue within Omni (i.e. store JSON outside the normal TX storage space) -- which is fine. Then again why change the protocol? 3rd Q: why the move towards centralization? 4th Q: how much less secure are off-blockchain-storage methods, is that known? I would guess that they are extremely insecure, especially when run by one interested party: Omniwallet.

zathras-crypto commented 10 years ago

Just popping in a couple of notes on here from my discussions re this topic.

Firstly, impact: 140 out of 15726 transactions are issuances (0.89%).

Secondly, complexity: Moving almost all of our transactions (everything except issuances?) to OP_RETURN is infinitely simpler.

Thirdly, prioritization: hashing issuance data = 0.89% reduction in utxo bloat, we still have >99% of txs in UTXO op_return = 99.11% reduction in utxo bloat, we still have 0.89% of txs in UTXO

It's all about prioritization, and effort vs gain. effort for op_return is much less than off-chain storage, and gain is 100 times higher.

TL:DR; when considering reducing impact on the blockchain, I'd far prefer us to focus on moving to a Class C OP_RETURN transaction type with backwards Class B compatibility in clients (since we can't rely on bitcoin core devs not removing support for OP_RETURN), as I feel that would provide a much greater return for much less effort.

Thanks Z

urbien commented 10 years ago

I will try to summarize the points I made on Sococo chat today and will post a separate comment on potential issues with Omniwallet hosting JSON objects.

Here is a link to Vitalik Buterin's recent article saying: The primary feature of Ethereum is not Turing-completeness ... Rather, the primary feature of Ethereum is state Ethereum provides a persistent per-contract hash-table, backed today by LevelDB (engine developed by Google which is also used in a very cool NoSQL Hyperdex database developed in Cornell under EMİN GÜN SİRER). That is to say, Ethereum's contract state relies on quite a robust data store, compared to saving JSON files.

Essentially Ethereum miners host the database, secure the database, provide rollback capabilities, in the event that the blockchain switches to another branch, provide extreme availability of the database and resilience to an array of attacks, as all miners host it. Those properties give Ethereum-based DApps a huge leg up on Bitcoin blockchain-based and Mastercoin DApps, which today must resort to often inferior federated solutions.

My interest is in building data-centric apps on chain. On-chain SaaS, if you wish. Specifically my interest in this discussion is because I want to find a Bitcoin blockchain alternative to Ethereum's state. Ethereum's state is very elegant but it has intrinsic disadvantages, which I am describing in Tradle's evolving whitepaper. Perhaps MaidSafe DHT is a good underlying mechanism, but I do not have a connection to the MaidSafe team to examine the match on a deeper level.

Either way, I feel strongly that the issue of on-chain or close-to-chain application-level data will soon become the hottest topic of all, as the value of the Bitcoin blockchain (and Mastercoin) as an application platform will be challenged by new DApps choosing Ethereum. This creates the urgency of addressing it sooner rather than later.

urbien commented 10 years ago

@m21 I agree with your concerns re: move towards centralization. May be you can help me out, I am ignorant on how Mastercoin even achieves storing anything but an extra hash on tx. My understanding is that Mastercoin transaction class C uses provable prun-able outputs. But this approach gives only extra 40 bytes for metadata on bitcoin tx, correct me if I am wrong.

dexX7 commented 10 years ago

Crucial data - that is any transaction data that is no meta-data - must be stored on-chain in any case, but non-relevant data might be stored off-chain and ...

Almost by definition, these strings and similar data are intended for consumption by human users of the applications, but these data items have no direct bearing on the validity of transactions themselves. So, it makes sense to capture and store the application-level data at the application level.

So ...

Why move some text strings from a distributed protocol to a central database? Then again why change the protocol?

Actually this is no protocol change and the transaction format remains the same. The currently given meta-data fields provide space for meta-data, but it's limited - both in size as well as intended purpose.

Different applications might consider other data as relevant while some of the fields might be useless for others. The idea was to move this data off-chain and let it be handled by applications which consume the data.

Why the move towards centralization?

It was suggested to use fixed links to external storage whereby I think instead of providing links to a static source, content might as well be identified and referenced by it's content's hashed value, so the data is not bound to one single source.

How much less secure are off-blockchain-storage methods, is that known?

Storing data off-chain should be considered as non-reliable and no crucial data should ever be stored this way. Meta-data is by no means crucial though.

140 out of 15726 transactions are issuances (0.89%)

Doesn't tell much given that property issuances are rather new and there were no tools to create them for a long time. Furthermore I tend to believe any number in this context might not provide an insight about usage in the future.

Moving almost all of our transactions (everything except issuances?) to OP_RETURN is infinitely simpler ... TL:DR; when considering reducing impact on the blockchain

Yeah, I fully agree. UTXO impact could be reduced immensely and I think this wouldn't be the only gain of using OP_RETURN.

However I think this topic tackles a slightly different area as well and only partly touches UTXO impact.

dexX7 commented 10 years ago

This is a FYI push. I named coinspark.org in the context of the AssetIssuanceStandard.md 20 days ago, but just realized it's @gidgreen turning #54 into reality.

Edit: worth to note there is another "standard" in the wild used by https://www.coinprism.com/:

{
  "source_addresses": [
    "3BEEYFKSqoa1Q7KMrwH6AMcYjFx5G2RZjr"
  ],
  "contract_url": "https://www.coinprism.info/asset/3BEEYFKSqoa1Q7KMrwH6AMcYjFx5G2RZjr",
  "name_short": "XBTPOP",
  "name": "BitcoinPopulation.com coins",
  "issuer": "BitcoinPopulation.com Issuer",
  "description": "Representing ownership in BitcoinPopulation.com",
  "description_mime": "text/x-markdown; charset=UTF-8",
  "type": "Stock",
  "divisibility": 2,
  "link_to_website": false,
  "icon_url": null,
  "image_url": null,
  "version": "1.0"
}

Source. https://cpr.sm/cm2gO3Np8o

But I also noticed another player seems to emerge: http://chroma.io, found this:

{
  "source_addresses": [
    "3Kfotk3zahcW2R94fdXWRtrR7b5Bn98Jy3"
  ],
  "contract_url": "http://chroma.io",
  "name_short": "CRYSTAL",
  "name": "Margo's Crystal Coin",
  "issuer": "Margaret Crable",
  "description": "This is a coin that Margaret Crable created. It earns the owner right to one fantasy drawing of a magic crystal.",
  "description_mime": "text/x-markdown; charset=UTF-8",
  "type": "CryptoToken",
  "divisibility": 1,
  "link_to_website": true,
  "icon_url": "http://chroma.io/chromacoin/coins/crystalcoin/crystalcoin.png",
  "image_url": "http://chroma.io/chromacoin/coins/crystalcoin/crystalcoin.png",
  "version": "1.0"
}

Source: http://chroma.io/crystal.json (http://chroma.io/chromacoin/)

dexX7 commented 10 years ago

I found this on bitcointalk:

http://internetofcoins.org/ https://bitcointalk.org/index.php?topic=827804.0;all

download

download 1

Not very telling, but might be worth to track.

msgilligan commented 4 years ago

@marvgmail There is some standards work for off-chain data going on here: https://github.com/LNP-BP/lnpbps