Enhancement Proposal: GET linked invoice/ACK metadata

monkeypants commented 8 years ago

Further to conversation about #5, suggest we consider enhancement to 1.0 version of the protocol that allows access to data about the state of an invoice/receipt by utilising gateway-generated GUIDs and GET verbs.

As an Australian Business
I need the ability to automatically monitor the status of my invoices
so that I have timely, low cost information about my accounts receivable

For example, if successful submission (response code 200) of POST /invoices/ returned data containing a gateway submission GUID, then GET /invoices/<GUID>/ (at the same gateway) could return data about the state of the submission. (subsequent analysis required - what is the state transition model of a submission? it's it more complicated than "unacknowledged" or "acknowledged"?)

As an Australian Business
I need the ability to link to the status of my invoices
So that I can chose to share access to that information with Interested Parties

and also:

As an Interested Party
I need the ability to monitor the status of 3rd party invoices
So that I can inform the management of my counter-party risk

This implies that the data returned from POST /invoices/ has two properties:

Acknowledgement metadata (might be a link to the receipt acknowledgement URL)
Some kind of signature that can be used to verify claims about the invoice that was submitted

For example, an Australian Business might tell their Financier (Interested Party) "I sent this invoice, and here is the gateway endpoint for the submission", and the financier would be able to verify that claim (of submission - same date, amount, etc) and monitor it's status (hmm, queried/disputed). Financiers making use of this information may be able to provide more price-competitive credit through a combination of lower operating costs (automation) and more precise risk management.

Another example might be a debt collection service that is sent verifiable invoices / status endpoints, and who uses them (in combination with payment data) to drive an automated debt recovery protocol.

As an Australian Business
I need the ability to prove an invoice payload corresponds to an invoice GUID
So that interested parties can verify invoice status information that I have shared with them

Is this all-or-nothing (one signature for the whole invoice), or would an Australian Business want to share some but not all data from a particular invoice with a third party who then needs to verify it?

Similarly, a successful POST /responses/ (or whatever the noun is, it's still not specified in the spec) should return a data containing a GUID, such that GET /responses/<GUID>/ will describe the state of the acknowledgement. This may or may not be a duplication of the data available in the GET /invoices/<GUID> endpoint. My preference is that the /invoice/ endpoint contains a link to the relevant /responses/ endpoint (for better cache performance of GET /invoices/ subsequent to transitioning out of null-acknowledgement state), but either could work I think.

Again, modelling the state machine of acknowledgements would be an interesting exercise - are there different stages that could be mapped to standard accounting practice? For example, is an invoice first mechanically acknowledged (valid submission received by gateway, forwarded to business), then some kind of human acknowledgement (received by business but no comment about liability), then some kind of affirmation (or dispute - indicate intention to pay/not pay)? Perhaps even a "the cheque's in the mail" assertion...

Note: I'm assuming GUID is essentially random and that /invoice/ and /response/ are different values of GUID, even if the response corresponds to the invoice.

If an invoice has been acknowledged, then the GET /invoices/<GUID> should contain a URL for the GET /responses/<GUID>. In other words, an invoice should link to it's response (if any) but not necessarily the other way round.

monkeypants commented 8 years ago

How should ack status get updated?

The financial system of the Australian Business must be authenticated/authorised to the Gateway where acks get posted. So that means we could accept PUT /responses/<GUID> too.

Alternatives depend on the design of the ACK state machine. If there is a reasonably linear, irreversible process where states can be skipped, then there are other options to PUT. For example, clobbering state with POST /responses/ could work, however this would create new GUIDs and require an update in two places (old_ack superceded-by new_ack, plus update the invoice acknowledged-by ack link). POST chain would preserve history but fowel caches. PUT chain would protect caches (and save on writes) but not necessarily preserve history.

Suggest PUT updates to the ACK with an embedded state-change journal that is returned in the body of the GET response.

monkeypants commented 8 years ago

About signature verification of invoice parameters: Maybe GET /invoices/<GUID>/ should return:

URL of the ack status endpoint (if present - as above)
signature of the ACK url (if present) by the gateway
some random value
SHA hash of every field in the invoice, salted by the random value
signature of (random value, collection of hashes) by the gateway.

This way, the invoice data is only POSTED once (by the seller) and PUT once (transition out of null-acknowledgement state). And PUT is tightly constrained to the null fields, signed hashes are imutable.

monkeypants commented 8 years ago

...Maybe GET /invoices// should return:

...

some random value

SHA hash of every field in the invoice, salted by the random value

sorry, that's silly. Salt with the GUID.

monkeypants commented 8 years ago

As an Australian Business
I need the ability to prove an invoice payload corresponds to an invoice GUID
So that interested parties can verify invoice status information that I have shared with them
Is this all-or-nothing (one signature for the whole invoice), or would an Australian Business want to share some but not all data from a particular invoice with a third party who then needs to verify it?

Maybe we should chunk it up in a way that is congruent with the UBL objects/collections. OrderReference, AccountingSupplierParty, AccountingCustomerParty etc.

What I mean is GET /invoices/<GUID> returns:

response URL
url + hash of OrderReference,
url + hash of AccountingSupplierParty
...etc, for each object/collection chunk submitted (POST /invoices/)

Where url of each object is optional (but not hash - if object present then must have hash), and if a url is present then presumably it requires authentication and appropriate authority to access. So, if an Australian Business provided the whole invoice to an interested party, they could verify each chunk in turn. Or the business might provide only relevant chunks to interested parties, who could verify those objects individually.

This might seem overcomplicated, but mapping to UBL semantics does not mean we have to map to their document granularity. HATEOS style approach should be simpler and more versatile in the long run, and shouldn't add real difficulty to maintaining a UBL/REST adapter.

onthebreeze commented 8 years ago

Phew! What a lot of ideas. I like the idea of a HATEOAS style collection of URL actions in response to a GET/invoices/{GUID}. Good questions about the state lifecycle of invoice - and something that will for sure provoke a lot of community discussion. There are several possible states and so that implies there are logically several response messages for a given invoice. There are a few implementation options. All start with a POST /invoices to the receiver gateway and get a GUID response that is the key to that specific invoice. After that there are several ways we could manage responses.

GET /invoices/{GUID} to the receiver gateway returns a UBL response structure with a complete history of status updates.
GET /invoices/{GUID} to the receiver gateway returns a HATEOAS style list of URLs of the form GET /responses/{GUID}, each of which point to a separate response document

Since I think we would also want to allow the recipient to POST responses back to the sender whenever there is a status change, option 2 fits more neatly with that model. So the recipient gateway (of an invoice) will POST responses back to the invoice sender (assuming the sender SMP specifies that capability) and it would also host the response for any third parties (eg debtor financiers) to GET/response/{GUID}. This discrete responses model is probably also a bit more compatible with a future blockchain style shared state model.

If you make a pull request along these lines, I'll be pleased to accept.

asmith1024 commented 8 years ago

Can we get higher-entropy resource identifiers please? Even if there's no easy exploit, a recognizable UID structure may encourage hostile experimentation.

onthebreeze commented 8 years ago

Excellent point. I'm not an expert on algorithms for cryptographically strong GUIDs (as opposed to just unique but still maybe guessable GUIDs). Any suggestions?

asmith1024 commented 8 years ago

Yes. The bonus is it's based on GUIDs under-the-bonnet, so the generating access point can still work with those. I have to produce a "vanilla" implementation and get permission to publish. Won't take too long.

monkeypants commented 8 years ago

UUID4 has 100 bits of entropy. It didn't occur to me that it wasn't enough, it's a good point.

https://tools.ietf.org/html/rfc4122.html#section-6 actually makes it too:

Distributed applications generating UUIDs at a variety of hosts must
be willing to rely on the random number source at all hosts.

We should be using payload encryption where we need it. It's fair to assume anyone who wants a complete copy of the public data can have one. What would be the problem if someone could magically guess the id's?

asmith1024 commented 8 years ago

Our users and their business partners need payload encryption and signing. Senders encrypt with the recipient's public key and sign using their own private key. No one reads the contents except the intended recipient. A business does not have to supply its own PKI. It can delegate this to its Access Point provider.
The real danger is not threats you can think of, but ones you can't. It is counter-productive to mandate a threat vector. Even if 99% of Access Point providers implement UIDs in a safe, non-enumerable manner, it's only a matter of time before the vulnerable implementation is exploited. Another reason for 1.

monkeypants commented 8 years ago

No argument about payload encryption and signing, we are on the same page there.

I'm not saying "I can't think of a specific threat, therefor there isn't one". I'm saying "any sufficiently determined/lucky attacker can get access to (some or all) invoice and acknowledgement URLs". The URLs are not guarded secrets, they are bandied about to whoever needs them. Some parties will be malevolent, foolish, and/or unlucky; It's unavoidable.

So, when (not if) an attacker gets access to invoice and acknowledgement URLs, what's the damage? If we make a concerted effort, then I think we can make it harmless by ensuring that knowledge of the URL (alone) only gives you access to safe public data (signatures, GUIDs, maybe cypher-text but I'm not even sure that's necessary).

In other words we might POST a signed and encrypted invoice/response, but we can only GET safe public data (e.g. the signature of the plaintext). And only store safe public data too.

So if I provide an plaintext invoice + URL to an interested party and assert that I sent it, the interested party can compare a locally computed hash of the plaintext with the signature retrieved by GET URL, and verify/falsify my assertion. Assuming we don't make a mistake with the crypro, I think that publishing the hash does not leak any commercially sensitive information. Is that right?

The difficult thing is to prevent commercially sensitive information leaking out through historic traffic analysis (assuming attacker has access to a high proportion of URLs, or even all of them). I think that means we need to ensure the public information is non-identifying. Assuming the attacker also has the entire NAPTR record, the only identifying information about the URL is the Access Point / Gateway (AP/GW, nomenclature?) that the URL belongs to, which translates to a collection of ABN through the NAPTR DB. That's an argument for using a popular AP/GW, there will be a size of AP/GW that's to small (identity of URL could be guessed). A sufficiently large one will provide identity-safety-in-numbers.

AlistairSkippr commented 8 years ago

I am not technically experienced to comment here but from a commercial perspective for debtor finance, this HATEOAS style collection of URLs to provide up-to-date statuses for invoices will be invaluable for a financier. We will be able to calibrate our API with the accounting platform who sends outs the invoice to dynamically track these statuses so if there were any red flag events, the appropriate actions can be pursued immediately.

asmith1024 commented 8 years ago

@AlistairSkippr the only argument we have with HATEOAS is the acronym itself: it sucks. What we're saying here is when we specify an identifier in a URL we don't use a database ID or a UID or any recognizable/enumerable data type, but rather as purely random (and therefore meaningless, except as an ID) a sequence of characters as possible.

monkeypants commented 8 years ago

the only argument we have with HATEOAS is the acronym itself: it sucks.

Do you prefer "RESTafarian", or should we just say "no session-state, locking mechanisms or any of that rubbish".

asmith1024 commented 8 years ago

We're stuck with the acronym. Otherwise we'd have to link to an appropriate explanatory resource every time we used a less annoying term because people wouldn't be able to Google it.

asmith1024 commented 8 years ago

We would expect that URLs will contain information useful to observers without compromising the confidentiality of the messages or the parties involved. Just so long as no one can infer anything about an Access Point's implementation from components of the URL (so to repeat: no DB IDs, no UIDs). Please note @monkeypants your example of a plain text invoice won't happen, because we'll be end-to-end signed and encrypted. Another reason why it won't goes right to your point about needing to assure participants that senders are who they say they are. We get that for free with no additional complications with asymmetric crypto. Otherwise we're going to have to specify canonical forms, signing blocks, appending rules and hey presto: we've reinvented ws-security. You could adopt an existing RESTful signing implementation, but trust me, a good one will just as much of a pain to implement as ws-security. Actually no, don't trust me. Check this out: http://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html

monkeypants commented 8 years ago

Please note @monkeypants your example of a plain text invoice won't happen, because we'll be end-to-end signed and encrypted

Alice sends Bob an Invoice by posting it to his Gateway. The invoice is encrypted with Alice's private key and Bob's public key. Nobody can inspect the invoice in flight without Bob's private key (including the Gateway). Bob is certain the invoice came from someone with access to Alice's private key (i.e. either Alice or someone she fully trusts).

There is some back and forth between the Gateways, resulting in URLs for the invoice (at Bob's Gateway, where Alice sent it) and Bob's response (at Alice's Gateway, where Bob sent it). The resource at the URL does not contain anything that identifies Bob or Alice, nor does it contain the contents of the invoice (encrypted or otherwise).

The invoice endpoint does contain a hash of the unencrypted payload. It also contains a link to the response endpoint.
The response endpoint does contain some data about the current status of the invoice, from Bob's perspective (e.g. acknowledge liability, query/dispute liability; I don't know the state model of invoice acknowledgements but obviously it has to be well known for this to work). This is not encrypted by Bob but it is signed by him.

Charles is an interested party in Alice's business. Perhaps a Debtor Financier or Auditor. Alice has access to the plaintext invoice (because she sent it). She sends it to Charles through another channel (good news Charles, I anticipate this income!). She also sends Charles the URL of the invoice endpoint. Charles generates a hash of the alleged plaintext, because he has the copy Alice gave him. Charles accesses the invoice URL and gets the published hash. Because they match, he knows that Alice did in fact send the exact invoice she provided him with. At that point it's all he knows for sure, not who it was actually sent to (or what they think of it).

Charles follows the link on the invoice endpoint to access the response endpoint. This contains information about the status of the invoice, allegedly from Bob's point of view (although it's origin is not trusted on face value). This information is valuable to Charles, because he knows the subject of the status (the invoice), although this context is not evident from the response endpoint. The status information is signed by Bob. Charles knows who should be signing it (because he has access to the invoice plaintext), so he finds Bob's public key and verifies the signature. This tells him two things: that the right person received and responded to the Invoice (not some shrill of Alice's), and the state of the invoice processing by Bob.

If Alice had not provided Charles with the invoice document, the public data at the invoice and response endpoints would just be a random-looking hashes at a random-looking URLs.

There might also be a case for Dave, a party with an interested Bob's liabilities...

Another reason why it won't goes right to your point about needing to assure participants that senders are who they say they are. We get that for free with no additional complications with asymmetric crypto.

Yes, but it's also not that simple. Alice and Bob are known to each other. The public data needs enough information to unlock value without without leaking sensitive information.

I think about this in three layers of trust.

Business <--> Financial Software. Full trust, proprietary interfaces. The Financial Software manages key material on behalf of the Business.
Financial Software <--> Gateway. Low/moderate trust, standard private interfaces. The Gateway can not access private key material or modify payloads, but does need to know Alice and Bob's identity and have access to the cyphertext. It is also trusted to generate random ids, not delete or tamper with public data, etc.
Gateway <--> Gateway. Zero trust, standard public interfaces. Gateways store and publish harmless public data.

If we can make it work, the value of the REST design is that it factors trust and sophistication out of the gateways. It's not eliminating Gateways, it just reduces them to a commodity service that proxies the counter-party and exposes public data. The Gateway is still valuable because it:

Mediates (control-inversion) between Financial Software endpoints, which enables participation of unreliable financial software over unreliable links.
Enforces the standardised protocol, ensuring compatibility between financial softwares.
Creates a crisp defensive layer between interfaces and financial data.

The way I see it, concentrating trust in the Financial Software (by factoring it out of the gateway) is the fundamental value proposition of the REST standard. Although the protocol described above does not demand it, trust in the gateway not to delete/tamper with public data could be factored out even further by using a blockchain ledger. for example, about 10-40 minutes after the invoice endpoint is published, it becomes impossible for the Gateway to delete or tamper with it (neat!).

This would make it even more attractive for enterprises such as Charles to build systems on the protocol, perhaps enabled by partnerships with Financial Software that cultivate a differentiating service ecosystem. That would be ruined by complex trusted gateways dependant on perimeter security; the system is more liquid if trust and complexity can be pushed to the edge of the network.

asmith1024 commented 8 years ago

We agree with what you're trying to achieve. It's the mechanism that is the problem. Hashing is fraught without a canonical form. With plain text you have to specify a newline convention, a character encoding, and within that other little details that will break a hash such as whether a UTF-8 encoding must have a byte order mark. We can't guarantee that the invoice will always be plain text though. With JSON and XML the rules for canonical forms are considerably uglier. And what about future serialization mechanisms? It needs to be a binary format. In this case we could specify that documents are signed first and then encrypted. This way Alice can hash the signature block, include the hash in the URL (noting that it has to be passed between all Access Points in the chain from her to the recipient) and pass the signature block to Charles. Byte-for-byte it's exactly the stream that arrives at the recipient's Access Point. If you're good with that I think we're on to something.

monkeypants commented 8 years ago

When I said "plaintext", I just meant "not cyphertext". I didn't mean any particular encoding. Yes, it has to be byte-for-bite equivalent.

I'd be happy with a binary format that's well supported by open source libraries and not patent encumbered. Anything spring to mind?

monkeypants commented 8 years ago

I see why you think a binary invoice encoding will simplify reliable hash comparison.

Response status is probably a simple finite state machine with a timestamped journal of states. It might be easier for Charles and Dave if this was json/xml encoded. Do you think we need it to be in a binary format too (for simple reliable signature checking)?

asmith1024 commented 8 years ago

Yes there is an unencumbered binary encoding I have in mind: a stream of bytes, such as you'll get out the back of an RSA call. If you get the same byte stream you get the same hash. Even if you use Microsoft. Alice can share her signed block with Charles and he can read it with her public key. If he can't read it, it's not hers.
The same goes for Bob's signed response block. If Charles knows who Bob is, he can read this block if Bob signed it. The trick is, Charles is not Bob's business partner so he won't be able to decrypt the response block, only Alice can (because it's encrypted with her public key). Alice can't however alter the signed block, so she can still present it to Charles and he doesn't have to trust her that it's genuine.
We are not comfortable with making document responses publicly available as I think you're suggesting. We would have to advise business participants that their responses are available to anyone, signed but not encrypted, so essentially open. Our users are not going to agree to that!

markmuir87 commented 8 years ago

Sorry I'm a bit late to this discussion. There seem to be a number of issues being discussed here. Just to clarify/re-cap, we're trying to answer these question?

How can gateways generate invoice UIDs (which forms part of the URL corresponding to that invoice) that are unlikely to clash or leak information?
Can we architect a secure system that allows some or all of the following:
1. Transacting parties can poll/monitor status of an invoice via a GET request
2. 'Authorised' third-parties can poll/monitor status of an invoice via a GET request
3. 'Authorised' third-parties can verify authenticity of an entire invoice via a GET request
4. 'Authorised' third-parties can verify authenticity of a specific claim regarding the value of an individual field on a specific invoice via a GET request
Is it possible for transacting parties to amend fields (outside of designated 'status fields') without a new GUID being generated?

And the above need to be achieved under the following constraints:

Encrypted payload in transit?
Payload digital signing?
Need invoice GUID to be as close to 'purely random' as possible?

My thoughts on the above constraints:

in-transit encryption: Although asymmetric crypto technically gives us in-transit encryption for free, doesn't this create a significant infosec risk? Specifically, if we're relying on fixed private keys for in transit encryption, if any of these keys are compromised it will be possible to decrypt a whole batch of transactions (assuming an attacker has been passively collecting and storing the cipher-text).
1. This would be particularly bad if a gateway was using their own private key to encrypt all of their customers' transactions, as 1 key compromise could result in many 1000s of decrypted transactions. TLS for in-transit encryption seems like a pretty neat solution, given its forward-secrecy/ephemeral key properties.
2. I'm assuming here that 'at rest' encryption is out of scope for this standard (and instead a concern for those implementing gateways). I guess it depends on how far we think the standard should 'reach in' to gateway businesses.
payload signing: This makes sense to me. Coupled with a long-standing (and public) OAuth authorisation, it should be fine for gateways to sign using their own private key. Presumably this requires the gateways on both ends to timestamp and store on both ends of the transaction.
random GUIDs: I'm assuming we want to maximise GUID entropy to avoid (a) GUID clashes and (b) GUID predictability

So on the questions posed further up, just as a starting point for discussion, how about something along the lines of:

Split the invoice payload into two sections: a mutable 'status' section (with standard fields and a fairly restricted set of valid values) and an immutable 'transaction details' section. Digitally sign both sections with your private key and append.
1. Generate the GUID by concatenating the invoice timestamp, serialised immutable 'transaction details' section and business ABN/UID, and then push this through a hashing function with sufficient collision resistance. Even though the search space for generating a hash-collision would be smaller than for a randomly distributed input (given an attacker could make intelligent guesses about the UID, timestamp etc.), in practice they'd still have to brute their way through at least tens of millions of unique inputs for any half-decent hashing function (e.g. SHA-256) before finding a collision.
2. I'm assuming a gateway would cut them off once they made it to around 100 wrong guesses... In a sense this means the attacker is ultimately network i/o bound, so the hashing function doesn't need to be particularly computationally or memory intensive.
3. The hash output, or a shorter hex digest, would form the last part of the URL (e.g. /invoice/). An unauthenticated GET request to this endpoint would return the two digital signatures. This would allow a transacting party to submit the 'status' section and/or the 'transaction details' section to a third-party out-of-band, who could then verify authenticity of the claim via this GET request.
4. If the 'status section' fields have a limited and predictable set of values, a transacting party could (a) be able to tell something has changed when the sig. changes and (b) be able to work back to the plain-text using the corresponding public-key (I think?)
As for third-party verification of claims about specific invoice field values, this is harder but maybe not impossible. What about:
1. transacting party: makes out-of-band claim to third-party regarding some field on a specific invoice (and provides the GUID & 'details signature')
2. third-party: GET /invoice/GUID/field_name
3. gateway: payload = hash ( concat ( details_sig, ABN, field_name, field_value ) ) + append_plaintext(receipt_timestamp)
4. gateway: RESPONSE final_payload = payload + append(private_key_sign(payload)) (signed so that third-party can verify using gateway's pub-key)
Amendments to the 'immutable transaction details' section would have to involve:
1. Lodging an amended invoice via a POST request (resulting in a new GUID being generated)
2. Updating one of the status fields of the 'old' invoice to 'amended/revoked', and another field pointing to the GUID of the amended invoice (via a PUT request)

I'm sure there are problems with the above scheme, but I figure it's a decent starting point for discussion of specifics. Commence hole-poking :)

EDIT: Just to get the ball rolling on 'hole-poking', a sensible alternative would be to define very granular OAuth scopes/attributes and then condition GET access to an invoice (or field value) on the request embedding a valid bearer token in the request header (or whichever other OAuth/OIDC auth flow is appropriate).

Personally, I think this is a neater solution, but would only be comfortable with it if the government were to develop a OAuth/OIDC identity provider assurance & audit framework (to allow private third-parties to offer competing alternatives to the government IDP). Otherwise the entire system will be reliant on a government identity assurance service monopoly.

It's also neater in the sense that:

We'd be using an established and ubiquitous open-standard, rather than our own custom one (which might be dangerous)
In theory, the scope granularity could scale from 'authorised to retrieve this field only once' to 'authorised to view all invoices of organisation x between the dates of blah and blah'

monkeypants commented 8 years ago

This thread is getting epic, thank you patient reader.

In this case we could specify that documents are signed first and then encrypted. This way Alice can hash the signature block, include the hash in the URL (noting that it has to be passed between all Access Points in the chain from her to the recipient) and pass the signature block to Charles.

Sorry, it took a while for the penny to drop on this.

You are right, the hash of the encrypted payload could be the psudo-random ID used in the URL. So anyone in the chain from Alice to Bob could generate the ID. Since they can also lookup Bob's Gateway, they could derive the entire public invoice URL. This squeezes a bit more trust/value out of the gateway by removing it's need to contribute entropy, so it seems like an improvement to me.

I had assumed encrypt-before-sign not sign-before-encrypt. Gateways need to know sender and recipient identity, so with encrypt-before-sign they could validate sender (perhaps return 409 Conflict response to clients that attempt to forward a payload with an invalid signature). It would be OK without this, but it just seems like good manners.

On the other hand, the hash of the response would not be a stable URL if the responses changes over time (timestamped journal etc, e.g. 1. queried, then 2. acknowledged, then 3. cheque's in the mail). It could be a hash of the invoice identifier salted with an attribute of Bob (such as his unique SMP endpoint), rather than a hash of his response payload (which would obviously not be known ahead of time). Other schemes include pre-computing hashes for all possible responses, but I think that gets a bit messy.

Yes there is an unencumbered binary encoding I have in mind: a stream of bytes, such as you'll get out the back of an RSA call. If you get the same byte stream you get the same hash. Even if you use Microsoft. Alice can share her signed block with Charles and he can read it with her public key. If he can't read it, it's not hers.

That's good, I didn't think about protecting an out-of-band channel between Alice and Charles.

Can you explain a little more about how to generate the RSA bytestream, I don't see how that avoids canonical form. Imagine I start with a UBL/XML document of dubious encoding...

The same goes for Bob's signed response block. If Charles knows who Bob is, he can read this block if Bob signed it. The trick is, Charles is not Bob's business partner so he won't be able to decrypt the response block, only Alice can (because it's encrypted with her public key). Alice can't however alter the signed block, so she can still present it to Charles and he doesn't have to trust her that it's genuine.

Yes, that's exactly why I imagined encrypt-before-sign. I had assumed Charles fetched the response himself rather than Alice provided it to him, but otherwise the same.

Charles always knows who Bob is, because Alice gave him a copy of the invoice plaintext (binary encoded :)

We are not comfortable with making document responses publicly available as I think you're suggesting. We would have to advise business participants that their responses are available to anyone, signed but not encrypted, so essentially open. Our users are not going to agree to that!

Yes it's counterintuitive, but that's exactly what I am advocating. I'm pretty sure its the simplest and most secure scheme with the best national productivity dividend (especially in conjunction with a public blockchain ledger that prevents future tampering/deletion).

But, I'm not going to argue with users about what they want, that's a very tricky business. Here's my argument about what's the best technical solution...

If we accept that there is value in Charles obtaining proof that Bob responded to Alice's invoice, then we have to consider two things: Should it be an optional or mandatory part of the protocol, and what is the best trust zone for that information to come from.

High trust zone; the protocol between Alice/Bob and their financial software (as you suggested)
Zero trust zone; open public data published on the gateway network (as I suggested).
Low trust zone; the protocol between Financial Software and Gateways (possible compromise)

I am advocating that it should be mandatory and public (zero trust). Apart from the false intuition of insecurity (a very real problem that can't be ignored), this is not dangerous because while the data is public, the information is not (unless you also possess a guarded secret, the content of the invoice).

Optional + Trustless is counterproductive because the presence/absence of response data exposes information about Alice's financial circumstances (by implication). As the number of derived services increases over time, the implication about Alice's financial arrangement would be diluted and there would also be more reasons to elect to publish response data. So eventually publishing would become a defacto-mandatory (the implication of not-publishing is that you are an empty shell business, which is suspicious). So the option of not publishing would be an annoying quirk.

There is very little difference between optional and mandatory high-trust schemes. They both suffer from a systematic response bias (reducing the utility to Charles). For example if Charles is a creditor, Alice will be inclined to send positive responses eagerly but negative responses hesitantly (or not at all). This will increase the cost of Alice's credit, mitigating the national productivity benefit of the whole scheme.

The possible compromise is a low-trust zone message, where mandatory and optional are indistinguishable. Both have the benefits of the no-trust schemes (eliminate Alice's response-bias) and high-trust schemes (obscure Alice's financial arrangements).

Two examples of low-trust schemes might be:

Alice's Gateway supports GET /responses/$invoice_hash, but only to clients authorised via OIDC (in possession of JWT with some custom claim). The less custom claims the better, the less Relying Parties the better, and access control rules that need to inspect the resource are expensive. But it could be done.
Alice's gateway relays the POST /invoice/ message to Bob's gateway, after trimming an attached "CC list" of response recipient endpoints. Then when it receives responses, Alice's Gateway notifies Charles' of the response. Another fine mess, but it could also be done.

So far, all the schemes in the low-trust zone that I have been able to think of push trust and sophistication back from the edge of the network towards the gateway. That seems like a heavy price to me, which has to be weighed against the cost of managing a false security intuition. Difficult problem...

asmith1024 commented 8 years ago

You can say that again @monkeypants . I have to submit some pseudo code and respond to Mr Muir's "asymmetric crypto is an infosec risk" thing before I get to your latest essay. A few things you can expect from us:

No one is talking about custom anything, except establishing a URL convention that allows interested parties to safely interact with the system.
Claims in JWT are cool because all you have to trust is the keys used to sign them. We have to very carefully restrict these however so the tokens themselves don't become bloated. The moment anything we do starts to look like OASIS-level complexity, we've failed.

monkeypants commented 8 years ago

How can gateways generate invoice UIDs (which forms part of the URL corresponding to that invoice) that are unlikely to clash or leak information?

@asmith1024 solved that to my satisfaction. Hash of the payload.

This would need to be salted to prevent a known-cyphertext attack, cycled through a predetermined number of iterations of something like pbkdf2 to prevent rainbow table attack etc.

asmith1024 commented 8 years ago

@monkeypants pseudocode that satisfies your requirements is coming. We also need to establish a lingua franca for code snippets.

markmuir87 commented 8 years ago

@asmith1024 Absolutely agreed on the OASIS complexity thing. The easier to implement the better.

monkeypants commented 8 years ago

OASIS seems economically inverted to me. It maximises the value of the traffic to the gateway network, rather than maximising the value of the gateway network to the traffic. It's a bridge-troll by design.

markmuir87 commented 8 years ago

@monkeypants Yeah I actually made a few comments on OASIS in the DTO API design guide issues (and an un-actioned pull request). I think it's just a variation on the 'embrace, extend, extinguish' standards lock-in strategy, with the variation being 'create' rather than 'embrace'. Just my humble opinion, but they create standards designed to create vendor monopolies.

Btw, have we got a basic example of an e-invoice somewhere?

I've got the Hydra OAuth/OIDC system running on my laptop (see: https://github.com/ory-am/hydra) and wouldn't mind trying to create a bunch of granular, customised scopes to secure endpoints and enable selective info disclosure.

onthebreeze commented 8 years ago

If you follow the links on the read me in this repo you'll be taken to the swaggerhub spec which includes a sample structure

Steven Capell Mob: 0410 437854

On 8 Jul 2016, at 5:48 PM, Mark Muir notifications@github.com wrote:

@monkeypants Yeah I actually made a few comments on OASIS in the DTO API design guide issues (and an un-actioned pull request). I think it's just a variation on the 'embrace, extend, extinguish' standards lock-in strategy, with the variation being 'create' rather than 'embrace'. Just my humble opinion, but they create standards designed to create vendor monopolies.

Btw, have we got a basic example of an e-invoice somewhere?

I've got the Hydra OAuth/OIDC system running on my laptop (see: https://github.com/ory-am/hydra) and wouldn't mind trying to create a bunch of granular, customised scopes to secure endpoints and enable selective info disclosure.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

markmuir87 commented 8 years ago

Cheers Steve. I'm heading home now but should be back up and running in a bit. Maybe we should thing about setting up a gitter or slack chatroom as some point. Then again, maybe not. It would make some of this discussion 'no so public'.

asmith1024 commented 8 years ago

@markmuir87 first the crypto thing:

TLS encryption is still mandatory (minimum 1.2 no fallback). I am referring to end-to-end encryption of payloads. Intermediate access points, proxies or whatever do not see the contents.
Two sets of keys are involved. The payload is first encrypted using the private key of the sender. This establishes the signature. If you know the identity of the sender you can look up their public key and verify they sent it. The payload is then encrypted using the public key of the recipient.
The signature block is hashed to establish the path component that will be used to look up network interactions involving the encrypted document. This hash is not tractable to attack.
A different hash will be generated for every document in an exchange, so we also need high-entropy, unique "transaction" IDs that allow all the documents in a workflow to be associated with each other. Next post.

All of this (except item 4), in fact the entire Internet, is predicated on certificates being properly secured by their owners. If a hostile party in possession of a compromised cert also controls an access point, router or proxy, every payload encrypted by a compromised cert that passes through is vulnerable (so in general for a transaction this will be only in one direction). Until the cert is replaced. I'll guarantee you that this will happen. TLS is underpinned by a cert. Compromise that and your session keys can be as ephemeral as they like, your server and all its comms are still P0wNx0rd.

Our risk profile includes a section that tries to understand the impact on the security of our clients' data in the presence of a hostile insider. No matter how clever you are this is bad, but you can limit the damage. Relevant here is not encrypting all your clients' data with the same cert, and not allowing any single admin account access to all of the certs. Of course someone will turn up with an access point that doesn't adhere to proper policies, that eventually gets compromised and dumps its clients' data. Happens all the time.

asmith1024 commented 8 years ago

OK so we need a transaction ID, potentially covering a whole paper trail of interactions in a business scenario. Every time a new document is created we have a new signing hash and it's going to get messy chaining them all together. (See what I did there? Version 2). The one good thing that ebMS3 did was introduce the notion of a "Conversation ID", but we need our equivalent to be part of URLs, so it can't be enumerable, etc. The following method is UID-based and very fast (although it only has to be computed once in the lifetime of a transaction). It also produces a minimum of 86% entropy (varies with your platform's UID generation algorithm). This is good.

Generate 3 UIDs: u, k and i
Using 128 bit AES, encrypt u using key k and initialization vector i
Base64Url encode the cipher text - there's your path component
Discard k and i. The generating system can make use of u if desired, but must not share it.
Profit

The symmetric encryption preserves the uniqueness of the underlying UID without exposing its structure. No hash collisions.

An algorithm quite similar to this was developed during the course of penetration testing an API I am involved with. I would not hesitate to recommend the testing resource. Let me know if you need some severe punishment and I'll give you his number.

monkeypants commented 8 years ago

That does look like a reliable way to produce an unguessable, unique transaction id. I'm not qualified to scrutinise it, but happy to accept it as a black box that may or may not get updated later in the development process.

I also see why a it would be useful to have transaction id linking the invoice and responses, especially for the inevitable hairy corner cases we haven't discussed yet (key revocation, software provider changes, SMP updates, etc), that can't be allowed to disrupt business of course.

But are you sure it belongs in the URL, not the payload? It seems to function a lot like an invoice number (Alice's reference code). I might be misunderstanding your intention here.

If it was in a URL, maybe is that a new interface that links invoice versions?

asmith1024 commented 8 years ago

@monkeypants yes it is an easy and fast algorithm to start with and it can easily be replaced. Since it doesn't signify anything except an unguessable unique identifier multiple generator versions can coexist. I'm only including it because now we are juggling two essayists with contributions that we would consider taking on board. A transaction ID is one of them.

onthebreeze commented 8 years ago

Isn't the restful equivalent of a transactionID to be found in that dreaded acronym .. HATEOAS ?

Steven Capell Mob: 0410 437854

On 9 Jul 2016, at 10:23 AM, Chris Gough notifications@github.com wrote:

That does look like a reliable way to produce an unguessable, unique transaction id. I'm not qualified to scrutinise it, but happy to accept it as a black box that may or may not get updated later in the development process.

I also see why a it would be useful to have transaction id linking the invoice and responses, especially for the inevitable hairy corner cases we haven't discussed yet (key revocation, software provider changes, SMP updates, etc), that can't be allowed to disrupt business of course.

But are you sure it belongs in the URL, not the payload? It seems to function a lot like an invoice number (Alice's reference code). I might be misunderstanding your intention here.

If it was in a URL, maybe is that a new interface that links invoice versions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

asmith1024 commented 8 years ago

The use case for this is partly your fault @onthebreeze, and partly @markmuir87 . Alice presents Charles with Bob's acknowledgement signature block, but when Bob later puts a stop on the invoice she forgets to inform Charles. The TxID allows Charles to query the workflow (not its contents but its existence). He is then able to confirm that there is more to the story than Alice has told him and can request the missing signature blocks. HATEOAS does not specify the form this TxID takes. We will require a string of random nastiness for this.

monkeypants commented 8 years ago

That depends. If I understand correctly, transaction is a new concept (relative to invoice and response) that links one or more invoices. Bob queries Alices' invoice "I thought we agreed you weren't going to charge me for those peanuts". Alice says "oh sorry, parlay that, here's a new one" (new URL, same transaction id). Bob acknowledges the new one, Charles nods approvingly.

But I' far from certain that's what @asmith1024 had in mind.

asmith1024 commented 8 years ago

I am trying to accommodate @markmuir87's observations here, so yes a workflow link. Later down the track we could imagine this ID linking all the documents in a business process (tender, quote, invoice, delivery, blah blah), including updates, addenda and so on.

onthebreeze commented 8 years ago

Yes I see.

What we are really talking about is an un-guessable key that is shared with interested parties and is used to group changes to the state of a "thing" of interest. For example an invoice. Is suspect that it wouldnt be the same ID for RFQ, contracts, order, invoice etc because there is no clean single process grouping (one contract might trigger 100 invoices and maybe only 20 of them are for debtor financing). So it's probably a fresh GUID for each thing about which one or more parties are interested in state changes. For example an RFT. Many respondents and potentially many updates to the RFT and a fairly clear lifecycle of the RFT. a GUID would group all that. Different GUID altogether to group the conversation that changes the state of an invoice.

On 9 July 2016 at 11:45, Andrew Smith notifications@github.com wrote:

I am trying to accommodate @markmuir87 https://github.com/markmuir87's observations here, so yes a workflow link. Later down the track we could imagine this ID linking all the documents in a business process (tender, quote, invoice, delivery, blah blah).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ausdigital/RESTful-framework/issues/6#issuecomment-231507668, or mute the thread https://github.com/notifications/unsubscribe/AMze2LlSHsf5_jGTH8Kk8Qkt39EmVlhcks5qTv0sgaJpZM4JBLdw .

onthebreeze commented 8 years ago

I think I might be missing something on this epic thread. If the hash of the encrypted payload is the “GUID” of the invoice - created by alice when she first creates the invoice. Then isn't that also the same GUID that is used by any interested party to discover any state changes about that invoice? Why do we need another GUID?

Charles uses the GUID to GET from Bobs gateway and, HATEAOS style, can find links to latest status.

So the GUID first generated by the creator of the “thing” that has an interesting state lifecycle (ie invoice) IS the so-called “transactionID”. No?

As I mentioned before, I dont think there is any point in a kind of "mega process" ID (eg from contract to pay) - too many complex many : many links in that stuff. Doesnt mean we cant link them if we want to - for example to link an invoice to an order - but that would be a HATEAOS style link in the GET response to the invoice that points to the order (with a different GUID).

On 9 July 2016 at 12:04, steve capell steve.capell@gmail.com wrote:

Yes I see.

What we are really talking about is an un-guessable key that is shared with interested parties and is used to group changes to the state of a "thing" of interest. For example an invoice. Is suspect that it wouldnt be the same ID for RFQ, contracts, order, invoice etc because there is no clean single process grouping (one contract might trigger 100 invoices and maybe only 20 of them are for debtor financing). So it's probably a fresh GUID for each thing about which one or more parties are interested in state changes. For example an RFT. Many respondents and potentially many updates to the RFT and a fairly clear lifecycle of the RFT. a GUID would group all that. Different GUID altogether to group the conversation that changes the state of an invoice.

On 9 July 2016 at 11:45, Andrew Smith notifications@github.com wrote:

I am trying to accommodate @markmuir87 https://github.com/markmuir87's observations here, so yes a workflow link. Later down the track we could imagine this ID linking all the documents in a business process (tender, quote, invoice, delivery, blah blah).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ausdigital/RESTful-framework/issues/6#issuecomment-231507668, or mute the thread https://github.com/notifications/unsubscribe/AMze2LlSHsf5_jGTH8Kk8Qkt39EmVlhcks5qTv0sgaJpZM4JBLdw .

monkeypants commented 8 years ago

My last post was simultaneous to Andrew's. Workflow link is a good desceription of what I thought you meant. Another URL, a different thing. Maybe a new ticket?

Subject of this ticket is a proposal to extend the public gateway GET interfaces with linked invoice/ACK metadata. Assuming we pull all the interesting side conversations into new discrete tickets, what do we need to do to resolve that proposal?

asmith1024 commented 8 years ago

shrugs I think we've reached the point where we've got far too many words and nowhere near enough code. This is a discussion about transport and metadata, and can safely be continued independently of the BPL->JSON-or-whatever guff. Let me run something up we can play with, or that I can at least demonstrate in a Hangout or something. Then we can alter the routes and mock responses until everyone's singing off the same song sheet.

asmith1024 commented 8 years ago

@monkeypants the canonical form I was thinking of was simply the conventional byte encoding for UTF-8 (so if you're in .NET you use a new UTF8Encoding(false) and then you're playing nicely with all the other kids). Send it as an application/octet-stream or Base64 encode it and send as text/plain (the receiver knows what to do with it based on the MIME type). This way we separate the cryptographic properties of the document from the semantics.

monkeypants commented 7 years ago

For binary format (octet-stream or Base64 endoded), http://msgpack.org/ seems extremely well supported by different language bindings. Getting UTF-8 encoded json (or native types) through a msgpack codec seems like a few lines of code in any language I care about.

Does it look like a good fit to you @asmith1024?

asmith1024 commented 7 years ago

Shiny!

monkeypants commented 7 years ago

nb: GovHack project http://slay-the-bridge-trolls.readthedocs.io/

ausdigital / RESTful-framework

Enhancement Proposal: GET linked invoice/ACK metadata #6