BlockchainCommons / Gordian-Developer-Community

Discussions of Gordian principles, Gordian specifications, Gordian references, and making it all a reality.
Other
63 stars 15 forks source link

Standards for Sequential or Animated QRs for Large Transactions #4

Closed ChristopherA closed 3 years ago

ChristopherA commented 4 years ago

A continuation of the discussions in https://github.com/cryptoadvance/specter-diy/issues/57#issuecomment-620294709 on the topic of creating some common cross-wallet standards for large bitcoin transactions, wallet descriptors, recovery data, account metadata (for instance invoice and payment notes), etc. that don’t reliably fit into a single QR code.

Some key questions are

— Christopher Allen

ChristopherA commented 4 years ago

I’d like to find a working example of an ECC 200 structured append, that could be uploaded as a PR to this repo, with an example API fragment in code (no matter what platform) to write, to read, and an example image and data. Later if others who are able to get it to work can share their discoveries making it work (or failure to) to that file will help us to decide if this is a viable option.

— Christopher Allen

ChristopherA commented 4 years ago

We are still testing QR output from the #LetheKit https://github.com/BlockchainCommons/bc-lethekit but I believe it’s limit will be QR code mode Version 40 with a resolution of 177x177 and about 1.5k binary to about 4K “alphanumeric mode” base45 per QR code. Can you confirm this @ksedgwic?

I’m not sure on size of SafePal display. @wolfmcnally?

— Christopher Allen

ChristopherA commented 4 years ago

If #LetheKit and SafePal show some limits on output, there are also limits on input.

I know that DIY and arduino hardware prototypes by @ksedgwic and @devrandom had problems with large QR codes because of limitations of ability for these cameras to focus up-close. @msgilligan had similar problems with laptop front-facing cameras offering only 540p or 720p and not focusing closely.

Yet I was surprised how well the little safepal credit card sized device's camera worked with QR codes. I presume it was specifically focused only for close up. And of course, most modern phones are great with QR codes and can read quite large ones. I believe that @rxgrant & @danpape did some studies of iOS limitations.

What other input limitations have you observed with QR codes?

-- Christopher Allen

ChristopherA commented 4 years ago

Encoding the message inside a sequential or animated QR code is also challenging. As many people have observed, using base64 forces a QR to enter into binary mode, which lowers the efficiency of size by 50%.

It is possible to encode to optimize for the native QR code compression as per ISO QR code ( ISO/IEC 18004:2006 ) standard called "Alphanumeric Mode", limiting you to capital letters, numbers, and a few characters. One method for encoding this is base45, however, this approach uses characters which are not allowed in URI encoding, making anything transported in the QR not able to passed forward by the QR reader app to other apps. We could theoretically remove a few more characters from the set so that we encode with fewer special characters, but I haven't found a common standard for that.

Even base64 is complicated, as there are many variants. Base64 itself also violates the URI character limitations, so there is a less-commons variant called base64URL. There is a long thread in the W3C-CCG standards group (where I am co-chair) about all the challenges and variants of base64 out there https://lists.w3.org/Archives/Public/public-credentials/2020Apr/0234.html

Bitcoin uses base58 in a few places, and there is base58check. Check isn't needed as QR already does error correction. However, there are no international standards that recommend it.

Bech32 isn't really an option for encoding the entire message, as @sipa designed it for ~40 characters. In fact, it can cause problems with too large a message. Quoting @sipa in BIP-0173:

An unfortunate side effect of error correction is that it erodes error detection: correction changes invalid inputs into valid inputs, but if more than a few errors were made then the valid input may not be the correct input.…Because of this, implementations SHOULD NOT implement correction beyond potentially suggesting to the user where in the string an error might be found, without suggesting the correction to make.

CBOR and protobufs are another option, but I don't have enough details on the pros and cons of these two other than that they are complicated and there be good enough library support for them, or that they are complicated enough they could cause security issues. What is your experience with using these?

@wolfmcnally has written some details about the different encoding formats Encoding Binary Compatibly with URI Reserved Characters. He falls fairly strong into the camp of using CBOR along with base64URL.

Maybe there is some combination of these that could be a good solution? What do you suggest?

-- Christopher Allen

sipa commented 4 years ago

My comment from BIP173 you're quoting doesn't apply if you don't use error correction (which hopefully no BIP173 implementation provides).

ChristopherA commented 4 years ago

We are having a discussion about new encoding standards in the W3C Credentials CG, and I really liked this analysis by @msporny in https://lists.w3.org/Archives/Public/public-credentials/2020Apr/0257.html:


Let's look at some data, which I generated based on the discussion in this thread. The data below shows what a base64, base64url, base58, and bech32 encoding of a value looks like for random byte values of 4, 8, 16, and 32 bytes. They are, in general, in ascending order by size. Each line specifies how much bigger the encoding is based on the baseline size. Each grouping has an associated analysis, because this isn't just about human readability, it's also about developer copyability, filesystem filename encoding, and encoding size. With that in mind, let's begin...

In general, these things hold true for all of the tests:

4 random bytes base64url: Fd-j-A baseline base58 : ZRrnb -17% larger base64 : Fd+j+A== 33% larger bech32 : 1zh0687q7xwhau 133% larger

One of the first things that pops out above is that base58 encoding is actually more efficient than base64 (because of base64 padding), and even base64url without padding (because base58 has some nice bit packing characteristics for small values).

8 random bytes base64url: cbaupa7qfVo baseline base58 : L2AXzqFbepH 0% larger base64 : cbaupa7qfVo= 9% larger bech32 : 1wxm2afdwaf745vh2ud8 81% larger

For 8 byte values, base64url and base58 are equivalent from a storage efficiency standpoint.

16 random bytes base64url: CyTZwJimleWCJxlmaMNvJw baseline base58 : 2NpD3dQYuV6ZaxMCDzsq4S 0% larger base64 : CyTZwJimleWCJxlmaMNvJw== 9% larger bech32 : 1pvjdnsyc5627tq38r9nx3sm0yu866x99 50% larger

For 16 bytes values, the storage efficiency still holds for base58, making it equivalent in size to base64url. Note that base58 will always use unambiguous characters, but more importantly, it will always be copy-pasteable... whereas, base64url will be copyable sometimes, and other times, a double click will result in a bad copy/paste (because of a breaking character in the base64url value). The number of times that this has bitten me while copy-pasting an AWS client secret resulting in scripts failing and minutes (to sometimes hours) wasted because of a base64url encoding issue has been a constant source of frustration over the years.

32 random bytes base64url: i1kbaCq6eZEYWqCKLzL3Aafv-pegrR-O1y3sRJLKd14 baseline base58 : ANxUehLobX2wPMyyiZp834KgvZXvg7hHiBK6GeZvgG1T 2% larger base64 : i1kbaCq6eZEYWqCKLzL3Aafv+pegrR+O1y3sRJLKd14= 2% larger bech32 : 13dv3k6p2hfuezxz65z9z7vhhqxn7l75h5zk3lrkh9hkyfyk2wa0qpd3upn 37% larger

The "advantage" of base64url starts to shine through once we hit 32 bytes, with a 2% encoding benefit over base58... which is the trade off for an inconsistently copyable string of characters that developers find themselves copying often during development.

As for the benefits of bech32, I honestly don't see it... yes, there is error correction, but once you get to 32 bytes, you've added close to 40% overhead... doesn't seem worth it to me unless you know a human being is going to be reading the value and something bad is going to happen if they get it wrong (payment going to wrong address, for example).

So, the priorities that I've heard most often are:

  1. Ease of copy/paste for developers.
  2. Encodes directly as a file on a file system.
  3. Size efficiency.
  4. Human readability.

Is this an esoteric discussion? Absolutely... but it goes to the heart of why developers feel strongly about this particular choice. They live and breath how this stuff is encoded and it has a direct impact on their productivity and the correctness of the programs that they write and run.


mcelrath commented 4 years ago

Just use binary mode for the QR, you can put up to about 3kb in it. It's up to client software to re-encode that binary string into base58 or bech32 or base64 if someone wants to copy-paste. Putting an encoding of an encoding inside an encoding (QR wrapped base58 wrapped PSBT) is silly. Anyway, QR-encoded PSBT would be a valuable BIP for wallet interoperability.

I'm rather seriously concerned about the size of transactions here. 3kb is not a terribly large transaction for service provider operations. At the end of the day, QR codes simply won't suffice for many usages. Last year Coinbase moved 50MB of UTXOs in a consolidation, which is around 25000 QR codes. For such things I suggest CD-R or DVD-R to avoid electronics in e.g. a USB stick. Doesn't save you from buffer overflows, but neither does a QR code. If you're seriously considering animated QR's, just move to CD-R's instead because an animated QR will only buy you a factor of ~few in transaction size, and totally fall down with multiple transactions.

The human in the loop here (and human readability) is entirely irrelevant as it's opaque data to the human anyway and could be malicious. There is no practical "verification" of QR codes or PSBT's that makes any sense. Some software must parse the PSBT and display a UX representing the transaction that makes sense in context. That software can be trojaned.

In order to verify that the data to be signed is not trojaned, the only solution is to use an out-of-band mechanism. Trojaned client software can write a transaction and display a QR that doesn't match the expected transaction, but tell you the QR contains something different than it does. The solution is to acquire the same data from two independent systems through different channels. There are numerous ways to arrange this by scanning the QR twice, once to originate it and another time to verify it on an independent system. Note it's the devices that do the verifications, not the human, and it works if you arrange for any single device to be trojaned or communications path to be MITM'ed. Fundamentally there are 3 devices required, and if all 3 can communicate or verify the same data, it requires compromise of all three to execute a fraudulent signing if you architect it correctly. This is way better than any point-to-point communication (like QR scan) which only requires one compromised device or communications path to defraud. (Also, use multi-sig) I've called this idea "triangular authentication" in the past, also called Out Of Band Two Factor Authentication.

A human is a meat-based cryptographic device for out-of-band verification that is capable of poorly executing exactly one cryptographic operation, A =?= B, and beyond about 10 bytes, has an enormous error rate. Just Say No to meat-based cryptography. Scan QR twice, sign once.

gre commented 4 years ago

Hello, I'm the tech lead of Ledger Live, a mobile and desktop software that works with Ledger's hardware wallet. We have developed a solution for a user to to export its data to mobile (accounts, settings) in trustless/networkless conditions with an animated QR code and I can share what i've learned with this.

Essentially the logic has been put into a standalone and open source library and is documented: https://github.com/gre/qrloop Please checkout the readme that explains a bit the pitfalls of this approach.

This demo shows scanning QR codes that contains the sourcecode of the library itself: =D

( Live demo of this https://qrloop.now.sh/ )

There are a few empiric decision that we did after doing many tests.

One of the problem is when you do animated QR code, you have some frame that are harder than other for phones to catch, and it's not about the QR code size (smaller QR code does not scan faster than bigger), so you tend to have 90% of your data but always miss the same frames if you take a simple loop approach. Also statistically, the more you obtain frame, the rarer is it for you to obtain new ones meaning that this does not make a linear progress but the speed decrease over the scans.. To improve a bit this condition, what you need instead is to implement replication and "fountain code" to allow to get the frame data faster. Based on the number of replicating frames, you can actually approach linear speed (you have more speed but some contains the data of others so you don't need all of them).

Our qrloop implement these and all of these are documented in tradeoffs in the README.

For your comment about the QR code encoding to use, I guess it all depends on the data you passthrough, but it can become negligible if you use a compression algorithm. Beware however that if you want it to be cross platform you should probably not put binary directly in a QR Code, we noticed that iOS have issues when it starts seeing some \0 when it parses the QRCode as text...

Also it might be good to include a checksum of the data in the encoded data itself to make sure the integrity is good and to implement auto retry mechanism instead of failure later.

ChristopherA commented 4 years ago

Right now base64 and hex are the most commonly encoded forms of cryptography in QR codes and URLs, which causes a number of compatibility and usability issues, and doesn't support fragments.

@wolfmcnally has written up a research paper on a proposal for QR-optimal, URL-Safe, Double-click selectable encoding format that supports multiple fragments, has a simple identifier that also serves as a integrity check, some compatibility with CBOR without a full CBOR parser, and some other advantages.

https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0005-ur.md

Comments appreciated!

ChristopherA commented 4 years ago

@gre Thanks for joining in the conversation!

Another fountain code proposal is https://github.com/digitalbazaar/qram — by @dlongley (cc: @msporny). One particular issue that they ran into was patents around certain fountain code techniques, and they believe that the one used in qram addresses them. Could you take a look at their proposal and compare your fountain algorithm with theirs? I really don't want to get into patent issues (which is why all of blockchain commons work is bsd+patent licensed).

I'd also love your thoughts on the multipart part of the of the UR encoding proposal by @wolfmcnally https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0005-ur.md — are there some useful ideas there? That proposal tries to be URL, QR code, and double-click safe.

Finally if you (or anyone else) are interested in being added to a Signal discussion group on these topics, DM or send me an email at ChristopherA@LifeWithAlacrity.com.

-- Christopher Allen

gre commented 4 years ago

for the patent algorithm, i'm not sure, qrloop is MIT, I just heavily inspired from https://en.wikipedia.org/wiki/Luby_transform_code and didn't use any library or copy any source code but just wrote it from scratch. The principle is pretty simple, it's essentially XOR between frames. The important part is finding optimal parameters to use (nb of redundancy, data size,..), (the essential part is at https://github.com/gre/qrloop/blob/master/src/exporter.js#L71-L111 ).

The library qram you have linked appears similar, but have interesting part about the probability to optimize the params automatically if I understand correctly the logic of https://en.wikipedia.org/wiki/Soliton_distribution .

UR encoding proposal indeed looks interesting. I'll need to read more over the next days to digest a bit more the concepts at stake here :)

dlongley commented 4 years ago

Hi @ChristopherA -- I noticed you asked for some input via the CCG mailing list from Digital Bazaar regarding animated QR codes. I see you've linked to our qram library above (an old demo is here as well).

You mentioned:

We understand that Dave Longley of Digital Bazaar did some research on this in their qram proposal — we'd like to do something similar but not at the binary level like qram does, and would love Digital Bazaars input.

I'm not sure what you're interested in doing differently, but if you have any specific questions that I can answer I will try my best to do so when I can. Just like @gre, our implementation was heavily inspired by papers written by Michael Luby.

Whatever does come out of your work here, I do hope it is largely compatible with what qram is doing -- or that we can get some agreement on a common format that could eventually become a patent-free, royalty-free, open standard.

aaronisme commented 4 years ago

In Cobo Vault, we use animated QR Codes to transfer the big size of data, basically, each qr code image is an json which indicate the total image number and index, check it out on rehttps://github.com/CoboVault/cobo-vault-docs

stepansnigirev commented 4 years ago

@aaronisme I looked into your QR code format, and I think it is very inefficient. Let's take an example from your documentation:

{
    "total": 2,
    "index": 1,
    "checkSum": "807271c36d6e275b0e89b023ccf8e3b6",
    "value": "H4sIAAAAAAAAAyVPu0oDURQ0UZYllWy51RKESGDdc88995VCJCaLjaIYbeW+tgoshAj+gxZ+gP6Dlb9h5f94F4eBmWaGmXxclJe966tH+7zdV3e7yz7E6nbX73vfb8uPcT4uLkghC+gUi2iRGx6EDjqQ4x2TTnsK1kt0gBC0AgHEQURhrTLo0QkTp7+jyeF6c1WcOB+pSx21IIM1yWhqG5mpSShuu4jSMCyPrxuiWSNh1gxssPr6/vl8PTstFm+jyRxe1sSApQQJF4WUWrfE2xZWJtISl1yvrW6VRl1kDAaUk38dMM0YlxpgfgDZ4jzPiqOH+9WmzNO8p8FNU3/6w1Rn0kuMUSC3iCATjSFhlGc8aM6iV9X7zR9wPHYkQAEAAA==",
    "compress": true,
    "valueType": "protobuf"
}

Here the total length of the QR code is 542 bytes. Useful data is the base-64 encoded payload is only 289 bytes (after decoding to raw bytes). So you waste 46% of the QR code space...

I strongly suggest going with something else. JSON is very inefficient, especially for QR codes.

ChristopherA commented 4 years ago

@aaronisme A few key things that we observed is that base64 forces QRs into binary mode, making them significantly less efficient. So many of us think that we should consider moving to bch32 encoding, resulting in something like https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0005-ur.md (currently draft as we are looking into adding fountain codes). Even though bech32 is not as efficient as base64, using it in a QR code reduced the size of the QR by over 20% even given random data.

We have a C library that implements the original bech32 encoding used by segwit, but also the new bech32bis as well as bc32 (basically32bis but URL safe). https://github.com/blockchainCommons/bc-bech32 For an example of how this library is used, take a look at the develop branch of https://github.com/blockchainCommons/bc-seedtool-cli

Another problem with JSON in QRs is that mobile apps can't register handlers to leverage system QR services (for instance the Camera app in iOS). Using a URL-safe format allows us to leverage system handlers. For instance, UR:CRYPTO-SEED/2HL0J0EM8K8JC8NX0NSZ309J335LDLT2G compresses well in a QR and has both error detection and due to short length has error identification/correction, and the Camera app would send to any app that accepts the UR: url scheme, and could then handle the that it received a cryptographic seed.

Does anyone in the wallet community have any issues with considering moving to come common form of encoding leveraging bc32 for both single QR as well as multipart/animated QRs?

-- Christopher Allen

stepansnigirev commented 4 years ago

@ChristopherA I would like to move to this encoding in Specter. Do you mind making a few test vectors for encoding of xpub, wallet descriptors, psbt, psbt splitted into multiple QR codes? I could write a python implementation for this encoding then.

ChristopherA commented 4 years ago

Great! @wolfmcnally and I will put together an initial proposal, and once your team gives a +1 we'll move it to a more formal place for further deliberation. I know we have some open issues on some formats, such as my discomfort with xpub/tpub/zpub and maybe moving to some binary representation more like SLIP32. But we'll get in at least the basics first.

aaronisme commented 4 years ago

@aaronisme I looked into your QR code format, and I think it is very inefficient. Let's take an example from your documentation:

{
    "total": 2,
    "index": 1,
    "checkSum": "807271c36d6e275b0e89b023ccf8e3b6",
    "value": "H4sIAAAAAAAAAyVPu0oDURQ0UZYllWy51RKESGDdc88995VCJCaLjaIYbeW+tgoshAj+gxZ+gP6Dlb9h5f94F4eBmWaGmXxclJe966tH+7zdV3e7yz7E6nbX73vfb8uPcT4uLkghC+gUi2iRGx6EDjqQ4x2TTnsK1kt0gBC0AgHEQURhrTLo0QkTp7+jyeF6c1WcOB+pSx21IIM1yWhqG5mpSShuu4jSMCyPrxuiWSNh1gxssPr6/vl8PTstFm+jyRxe1sSApQQJF4WUWrfE2xZWJtISl1yvrW6VRl1kDAaUk38dMM0YlxpgfgDZ4jzPiqOH+9WmzNO8p8FNU3/6w1Rn0kuMUSC3iCATjSFhlGc8aM6iV9X7zR9wPHYkQAEAAA==",
    "compress": true,
    "valueType": "protobuf"
}

Here the total length of the QR code is 542 bytes. Useful data is the base-64 encoded payload is only 289 bytes (after decoding to raw bytes). So you waste 46% of the QR code space...

I strongly suggest going with something else. JSON is very inefficient, especially for QR codes.

Thx and yes, currently we use json but I also agree the json format is inefficient, we are considering another format, and open to have a disucssion

aaronisme commented 4 years ago

@aaronisme A few key things that we observed is that base64 forces QRs into binary mode, making them significantly less efficient. So many of us think that we should consider moving to bch32 encoding, resulting in something like https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0005-ur.md (currently draft as we are looking into adding fountain codes). Even though bech32 is not as efficient as base64, using it in a QR code reduced the size of the QR by over 20% even given random data.

We have a C library that implements the original bech32 encoding used by segwit, but also the new bech32bis as well as bc32 (basically32bis but URL safe). https://github.com/blockchainCommons/bc-bech32 For an example of how this library is used, take a look at the develop branch of https://github.com/blockchainCommons/bc-seedtool-cli

Another problem with JSON in QRs is that mobile apps can't register handlers to leverage system QR services (for instance the Camera app in iOS). Using a URL-safe format allows us to leverage system handlers. For instance, UR:CRYPTO-SEED/2HL0J0EM8K8JC8NX0NSZ309J335LDLT2G compresses well in a QR and has both error detection and due to short length has error identification/correction, and the Camera app would send to any app that accepts the UR: url scheme, and could then handle the that it received a cryptographic seed.

Does anyone in the wallet community have any issues with considering moving to come common form of encoding leveraging bc32 for both single QR as well as multipart/animated QRs?

-- Christopher Allen

Thanks, I will take a look of the related resources

ChristopherA commented 4 years ago

@aaronisme @stepansnigirev @gre

Here is a first pass by @wolfmcnally and myself on some cryptographic objects encoded using this QR-optimized URL-Friendly bc32 encoded format.

https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0006-urtypes.md

This is just research this point — we are not ready to propose this as a standard yet without further buy in by others, but we do have the prospect of adding support for these in our seedtool-cli and lethe-kit hardware for prototyping further.

What I'd like to see is some +1's on this general approach, then we can discuss what improvements are required, how and where we should incubate and test them, and long-term strategy of if we move this into the BIP process, or even into a W3C or IETF process as at least at the bottom level, these are broadly applicable to multiple cryptographic technologies.

-- Christopher Allen

ChristopherA commented 4 years ago

Some commentary on the crypto-seed object. We feel this is a good fundamental cryptographic object, which has broad utility to be fed into other tools. In particular, it is the source of entropy into BIP39, into SLIP39, into Lightning Recovery Seeds and of course into BIP32 master keys for bitcoin and other HD formats, even HD keys for 25519.

We had considered having the crypto-bip39 be the base, but felt that having to return BIP39 back to binary to move into the other forms was suboptimal, especially when we look at a future where maybe these master seeds are also used by Signal and other tools. If all our tool chains all support the same base, then we can use airgapped master seeds in other places.

One of the things we learned at the two RWOT design workshops where a group of us proposed some improvements to SLIP39 was that some parties need to add optional metadata to seeds, but there was little agreement on what those were. For instance, for brevity of writing down Shamir shards, the SLIP39 standard does not allow metadata. The addition of the birthday data in the test vector is not as a requirement, but more to show that you can and should ignore extra data if you don't need it.

The final form (lower-cased, should be transmitted to QR upper-case) of seed plus optional birthday is:

ur:crypto-seed/5gq4p3cfskqpyh32kzvpy56x3vkmc5szmpjpj376py6zrs

-- Christopher Allen

aaronisme commented 4 years ago

@aaronisme @stepansnigirev @gre

Here is a first pass by @wolfmcnally and myself on some cryptographic objects encoded using this QR-optimized URL-Friendly bc32 encoded format.

https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0006-urtypes.md

This is just research this point — we are not ready to propose this as a standard yet without further buy in by others, but we do have the prospect of adding support for these in our seedtool-cli and lethe-kit hardware for prototyping further.

What I'd like to see is some +1's on this general approach, then we can discuss what improvements are required, how and where we should incubate and test them, and long-term strategy of if we move this into the BIP process, or even into a W3C or IETF process as at least at the bottom level, these are broadly applicable to multiple cryptographic technologies.

-- Christopher Allen

Sounds Cool, I have just walked through https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0006-urtypes.md. just wondering instead of creating somthing new, why not use currently widly used protocol definition like protocol-buffers ?

aaronisme commented 4 years ago

@aaronisme @stepansnigirev @gre

Here is a first pass by @wolfmcnally and myself on some cryptographic objects encoded using this QR-optimized URL-Friendly bc32 encoded format.

https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0006-urtypes.md

This is just research this point — we are not ready to propose this as a standard yet without further buy in by others, but we do have the prospect of adding support for these in our seedtool-cli and lethe-kit hardware for prototyping further.

What I'd like to see is some +1's on this general approach, then we can discuss what improvements are required, how and where we should incubate and test them, and long-term strategy of if we move this into the BIP process, or even into a W3C or IETF process as at least at the bottom level, these are broadly applicable to multiple cryptographic technologies.

-- Christopher Allen

as for Cobo Vault, we are currently considering moving to this UR Types and we need to do some investgation about how to do it

wolfmcnally commented 4 years ago

@aaronisme

Sounds Cool, I have just walked through https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0006-urtypes.md. just wondering instead of creating somthing new, why not use currently widly used protocol definition like protocol-buffers ?

Assuming you're comparing protocol buffers to CBOR, we're not creating anything new. Protocol buffers lacks some desirable attributes that CBOR possesses. For one thing, protobufs are not self-describing— they require a compilation step (protoc) just to be able to read them. This makes it more difficult to migrate to future versions, and makes adoption more complex in general. CBOR is also an IETF standard (RFC 7049). CBOR is seeing wide adoption, particularly in the IoT space, and has wide language support. The UR spec isn't a replacement for either CBOR or protobufs; it adds an encoding layer on top of CBOR for particular purposes.

msporny commented 4 years ago

The UR spec isn't a replacement for either CBOR or protobufs; it adds an encoding layer on top of CBOR for particular purposes.

For what it's worth... there are experiments in converting Verifiable Credentials and DID Documents to CBOR as well. Preliminary testing shows that CBOR has all of the qualities that we'd want to see and that we'd be heading in that direction. Sounds like we're converging with what @wolfmcnally is recommending.

wolfmcnally commented 4 years ago

The UR spec isn't a replacement for either CBOR or protobufs; it adds an encoding layer on top of CBOR for particular purposes.

For what it's worth... there are experiments in converting Verifiable Credentials and DID Documents to CBOR as well. Preliminary testing shows that CBOR has all of the qualities that we'd want to see and that we'd be heading in that direction. Sounds like we're converging with what @wolfmcnally is recommending.

Yes, and unless I miss my guess @msporny , one of the traits CBOR possesses that you'd be particularly interested in is a defined set of rules for canonicalization, which when applied should mean that a given document will always encode to the same byte sequence. This makes it possible to hash (or sign) a document and know that someone else who hashes a document they create from the same source material will produce the same result.

msporny commented 4 years ago

Yes, and unless I miss my guess @msporny , one of the traits CBOR possesses that you'd be particularly interested in is a defined set of rules for canonicalization, which when applied should mean that a given document will always encode to the same byte sequence.

Well, yes, kind of - but not as much as you'd think. :)

The main reason is that there is a powerful mapping from JSON-LD to CBOR-LD that uses XOFs to do high-performance compression for JSON-LD documents (800%+ compression rates) without having to re-sign the already canonicalized JSON-LD data (and signature).

Note that CBOR actually does not have a defined set of rules for canonicalization - yes, it's easier to do c14n in CBOR if you follow a distinct set of rules... but the same is true for JSON (see JCS). The problem is that it only works for very simple/flat data structures... it tends to fall apart when you get to the level of complexity in Verifiable Credentials.

In any case, I don't think those are points that need to be debated... different communities like CBOR for different reasons. The important thing is that we all like CBOR and are converging on it as a reasonable format to shove into a QRCode. :)

aaronisme commented 4 years ago

@ChristopherA I would like to move to this encoding in Specter. Do you mind making a few test vectors for encoding of xpub, wallet descriptors, psbt, psbt splitted into multiple QR codes? I could write a python implementation for this encoding then.

@stepansnigirev have you decided to adpot this encoding in Specter. any release plan about this encoding?

ChristopherA commented 4 years ago

@aaronisme & @stepansnigirev

@wolfmcnally has been working on getting a reference implementation C code and examples for crypto-seed & Shamir this weekend. Coming in next few days..

Still to do are some standards for derived keys, account maps (aka wallet descriptors) and a few other useful primitives that fit in a single QR. We are still working on defining how fountain codes should work in this approach for larger items like QR code’s for PSBT.

aaronisme commented 4 years ago

@aaronisme & @stepansnigirev

@wolfmcnally has been working on getting a reference implementation C code and examples for crypto-seed & Shamir this weekend. Coming in next few days..

Still to do are some standards for derived keys, account maps (aka wallet descriptors) and a few other useful primitives that fit in a single QR. We are still working on defining how fountain codes should work in this approach for larger items like QR code’s for PSBT.

Cool. I am also working on some code to implement this encoding. and for the larger data, any concern current approach ?

FoundationKen commented 4 years ago

We're interested in implementing this as well. Looking forward to the C implementation.

secinthenet commented 3 years ago

@aaronisme

Sounds Cool, I have just walked through https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-0006-urtypes.md. just wondering instead of creating somthing new, why not use currently widly used protocol definition like protocol-buffers ?

Assuming you're comparing protocol buffers to CBOR, we're not creating anything new. Protocol buffers lacks some desirable attributes that CBOR possesses. For one thing, protobufs are not self-describing— they require a compilation step (protoc) just to be able to read them. This makes it more difficult to migrate to future versions, and makes adoption more complex in general. CBOR is also an IETF standard (RFC 7049). CBOR is seeing wide adoption, particularly in the IoT space, and has wide language support. The UR spec isn't a replacement for either CBOR or protobufs; it adds an encoding layer on top of CBOR for particular purposes.

Protobufs don't require compilation to read them, both the binary and text formats are well defined and easy to parse without the message definition. What does require compilation is knowing the field names (the binary format only includes the field index), and generating client libraries. I don't know what you mean by "This makes it more difficult to migrate to future versions, and makes adoption more complex in general". Why is migration with CBOR easier than in protobufs? I do agree that having the requirement to use the protobuf compiler for generating up-to-date client code is annoying, but I think it's a fairly minor con in the grand scheme of things. Also, protobufs are far more popular than CBOR. Just to give some numbers, a quick Github search shows more than 20x factor in the number of repos mentioning "protobuf" vs "cbor".

wolfmcnally commented 3 years ago

I added a Q&A to the UR paper on the choice of CBOR over Protobufs.

Rspigler commented 3 years ago

Can this be closed with BCR-2020-005? Is there a plan to submit it to the wider community on the bitcoin-dev mailing list as a proposed BIP?

wolfmcnally commented 3 years ago

Without objection, I'll close this. Please open a new topic to discuss advancing UR as a BIP.