Referencing a document: documentID vs local ID

GSMA-CPAS / BWRP-common-adapter

The "Layer 2.5" with all Common functionality APIs

Apache License 2.0

2 stars 0 forks source link

Referencing a document: documentID vs local ID #9

Closed sschulz-t closed 3 years ago

sschulz-t commented 3 years ago

The current common-adapter uses a local mongodbID on each partner application and a name to refer to a document. This can cause name collisions. Using name + timestamp as a common id does not help much, two parties might not look at the timestamp and talk only of the name. This was necessary because chaincode does not yet allow documentID reservation. Beside this, is there an additional reason for using the mongodbID locally?

Using the chaincode documentID as the one and only reference to a document, being it at chaincode level, in the blockchain adapter, the common-adapter, and the application would prevent misunderstandings. The current chaincode generates a unique ID exactly for this purpose. Having multiple local IDs just makes things worse as the system gets more and more complex.

One of the key features of the bwrp application is to ensure that two parties negotiate and talk about the same contract and that this can be proven in case of a dispute.

The blockchain adapter can be enhanced to provide a documentID registration/reservation feature in order to allow the new approach of creating drafts. The workflow could then be adapted to:

documentID = bca.reserveDocumentID();
use this id as the reference id everywhere, also in the document header
use this id when uploading the private document etc.

We can take care of the chaincode and blockchain adapter modifications and create a PR for these modules.

zkong-gsma commented 3 years ago

as per requested. documentId is now "returned" as a key, once it has been "sent"/"received".

{
  "contractId": "5fd09d8aba375b001ded183b13b7",
  "header": {
    "name": "Contract with XYZ test 2111222222",
    "type": "contract",
    "version": "1.0",
    "fromMsp": {
      "mspId": "DTAG",
      "signatures": [
        {
          "id": "id",
          "name": "name",
          "role": "role"
        }
      ]
    },
    "toMsp": {
      "mspId": "TMUS",
      "signatures": [
        {
          "id": "id",
          "name": "name",
          "role": "role"
        }
      ]
    }
  },
  "body": {
    "key": "{}"
  },
  "state": "SENT",
  "documentId": "a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e",
  "creationDate": "2020-12-09T09:48:58.833Z",
  "lastModificationDate": "2020-12-09T10:38:11.008Z"
}

zkong-gsma commented 3 years ago

draw01 (1)

zkong-gsma commented 3 years ago

why do we need to "unify" the "selector" from both side?

we technically can provide a "filter" options such as

GET /api/v1/contracts/?filter=documentID:like:XXYYZZZ to find the same documentId or GET /api/v1/contracts/?filter=name:like:contract_with_tmus

we do not need to mandate how an "exchanged" document is stored locally.

zkong-gsma commented 3 years ago

this is also how we intent to support "additional" local only name "tagging".

eg, on "Org A", additional local_name="system_id 1234"

on "Org B", additional local_name="XYZ id abcd1234"

note. If we use the "same" DocumentId per-say, logically the whole "payload" should be the same. in this case, they are not. because of how they are individually created/stored locally.

zkong-gsma commented 3 years ago

draw01-Page-2

zkong-gsma commented 3 years ago

the integrity of the "exchanged payload" + "documentId" will never be compromised as they can be "re-validated" via "the network" thanks to the "offchain communications"

Read hash from storage key GetState(sha256("MSP" + "documentId)) *something we should consider from the Chaincode/Blockchain-adapter, to query the blocks.
compute local hash sha256("exchanged payload") and compare with the value above.

the hash MUST always match and it will. (or else someone has tampered with the exchanged payload)

smeyerzu commented 3 years ago

So we have two proposed solutions: a) having contractID and documentID: Clients and common-adapter reference by contractID. The common-adapter holds/creates mapping between contractID and documentID Users reference Documents by documentID

b) only having a documentID: Everything references by a single global unique documentID

Pros of each solution: a)

simplicity with only one ID
easy to use by additional components

Cons of each solution: a)

adds overhead
might confuse developers (when to use contractID and when documentID?)
hard to debug different IDs

Please add more points that can be reflected as simple bullet-points. With this list we can have an easier discussion and go for a decision.

zkong-gsma commented 3 years ago

I think again, we are debating the wrong thing here. anyway. you are again seeing both "Org A' and "Org B" as the same entity.

Pros of each solution: a) Simplicity. Each org only needs to deal with 1 locally generated "ContractId". when a "ContractID" is generated. its is automacially Mapped to a "DocumentID" once a "payload" has been exchanged. live. eg, /contracts/{contractId}

in the eventually needs of exchanging "usage" as well as settlements or "discrepancy". (any additional) data

the common adapter will perform all the "internal "links" of subsequenct "documentID" created.

eg, /contracts/{contractId}/usages/{usageId} in the backend, in the "contract Object" will contain. { contractId:XXX documentID:YYYY }

a "usageId" will contain. { usageId:XXXX contrctId: (of above) documentId: (the documetId for the Usange Exchange.) } //plus we are exchanging the "contractId"'s DocumentID within the Usage "exchange"

(same applies to other types of document exchanges)

all the remapping/translation is done automatically by the adapter. the "client" will never ever need to know what the "documentId" is.

and to be honest, i think if "identification" of a "contract" is required. It should be allowed by the "author" to define/generate

eg, DTTMUS001/REF/AX and is part of the "exchanged Contract"

463303f91493169a2b0642b26417b002

eg, like an "invoice Number" or "PO number" should not be defined by this layer, but the "Front end layer". documentID is what we used to reference internally and user does not need to know or understand them.

simplicity with only one ID [K> this remains the same. From Org A perspective. You always working with 1 contractID only. at no point in time you need to worry of wonder what is the "Org B"'s contractID. so, we are still working with 1 ID.]
easy to use by additional components [k> what difference does it makes? an ID is an ID. either being GET /api/v1/contracts/{documentId} vs GET /api/v1/contracts/{contractId}]

Cons of each solution: a)

adds overhead [K> what overhead? this is an abstracted layer]
might confuse developers (when to use contractID and when documentID?) [k> developer will never ever need to use "documentID". They will only use "contractID" that is why its even hidden in the first place. the "documentID" is automatically used behind the scenes without ever needing the developer to do anything, as from day 1. contractID is maped to a documentId internally]
hard to debug different IDs [K> you will only debug independently on your "Org", so what does "different" ID matters? ]

b) from a "RESTful" design perspective. an "sameID" should represent the same "payload". The fact that "Org A" GET /api/v1/blockchain/getPrivateDocument/{documentID} can be different to "Org B" GET /api/v1/blockchain/getPrivateDocument/{documentID} does not sound logical. (especially where there is a need to associate local names and values to them)

Horizon-Developer commented 3 years ago

A lot has already been discussed, but I would like to add an extra Pro for option one (that is a seperate id localy vs the network wide id). and that is the argument of leaky abstractions. When we use the same id everywhere (also for things like drafts) we have a hard dependecy in the whole project on this single id (how and where it is generated). The common adapter is there in the first place to draw a clear boundary between network/blockchain interaction and the rest of the ORganisation level application. and if the common adapter would then send the generic id to the rest of the application world you are virtually getting a dependecy from these application to the blockchain specific detail and instead of one dependecy (between common adapter and bc) you get potentialy alot of dependecies between all the different applications in the organisation layer.

So By that argument I would support having a seperate technical id.

If you want to identify and know if 2 documents are exactly the same the common way is to generate a LOCAL hash and present this as a number or as an identicon (for easy human checks) instead of relying on a system generated ID, I do think the generated id is a sound technical solution but only to be there on the background.

smeyerzu commented 3 years ago

In general, I agree that decoupling is something good. However, in this BWRP application we back on blockchain and associated with this on a distributed/common ledger. If we now decouple our documents and IDs from the common ledger we lose one of the advantages of our system.

Nevertheless, if most of you prefer this decoupling I would argue for a clearer structure of the JSON. Names of the IDs that make clear that one belongs to the common ledger and one is only used as reference in the local system. This is important for users and even more for developers. Also, it might make sense to restructure the document in a way that makes clear what the "primary key" of that document is and what the "primary key" of the part is that is shared between multiple organizations over the common ledger.

I am also fine with using hashes for comparing documents. However we need to define which part of the JSON is used for computing the hash. I would suggest implementing Merkle Trees to create hashes over parts of the document and compare them. It might make sense to create an individual ticket for that.

zkong-gsma commented 3 years ago

we are following general RESTFul API design best practices.

https://docs.microsoft.com/en-us/azure/architecture/best-practices/api-design

Id (primary Key) for each "resources" is based on the "resource" "verb" + ID in camelCase hence contractId, usageId, signatureId, settelmentId, etc, etc.

again, i would like to emphasize, what is represented to the NorthBound API, does not need to represent the backend 1-to-1.

as for document SIgning/hashes, We introduced this to provide the exact "exchange" payload that is used for exchanging.

http://tmus.poc.com.local:3030/api/v1/contracts/5fd7699003b22c001dec9986194b?format=RAW format=RAW where it will return

{
  "contractId": "5fd7699003b22c001dec9986194b",
  "state": "SENT",
  "documentId": "0c85bf2582b89430d5fe6b346f3e254093d94521daa5ee9c8f18f6d9460ad2e3",
  "raw": "eyJ0eXBlIjoiY29udHJhY3QiLCJ2ZXJzaW9uIjoiMS4wIiwibmFtZSI6IlRlc3Rpbmcgd2l0aCBHdWlsbGFtZSBhbmQgS3JpcyIsImZyb21Nc3AiOnsic2lnbmF0dXJlcyI6W3siaWQiOiJzaWduYXR1cmUtMCIsIm5hbWUiOiJrb25nIiwicm9sZSI6ImFhYSJ9XSwibXNwSWQiOiJEVEFHIn0sInRvTXNwIjp7InNpZ25hdHVyZXMiOlt7ImlkIjoic2lnbmF0dXJlLTAiLCJuYW1lIjoiZ3VpbGxhbWUiLCJyb2xlIjoiYWFhIn0seyJpZCI6InNpZ25hdHVyZS0xIiwibmFtZSI6ImNocmlzdG9waGUiLCJyb2xlIjoiYmJiIn1dLCJtc3BJZCI6IlRNVVMifSwiYm9keSI6eyJnZW5lcmFsSW5mb3JtYXRpb24iOnsibmFtZSI6IlRlc3Rpbmcgd2l0aCBHdWlsbGFtZSBhbmQgS3JpcyIsInR5cGUiOm51bGwsInN0YXJ0RGF0ZSI6bnVsbCwiZW5kRGF0ZSI6bnVsbCwicHJvbG9uZ2F0aW9uTGVuZ3RoIjpudWxsLCJ0YXhlc0luY2x1ZGVkIjpmYWxzZSwiYXV0aG9ycyI6bnVsbCwiVE1VUyI6eyJjdXJyZW5jeUZvckFsbERpc2NvdW50cyI6bnVsbCwidGFkaWdDb2RlcyI6eyJjb2RlcyI6bnVsbCwiaW5jbHVkZUNvbnRyYWN0UGFydHkiOmZhbHNlfX0sIkRUQUciOnsiY3VycmVuY3lGb3JBbGxEaXNjb3VudHMiOm51bGwsInRhZGlnQ29kZXMiOnsiY29kZXMiOm51bGwsImluY2x1ZGVDb250cmFjdFBhcnR5IjpmYWxzZX19fSwiVE1VUyI6eyJzaWduYXR1cmVzIjpbeyJpZCI6InNpZ25hdHVyZS0wIiwibmFtZSI6Imd1aWxsYW1lIiwicm9sZSI6ImFhYSJ9LHsiaWQiOiJzaWduYXR1cmUtMSIsIm5hbWUiOiJjaHJpc3RvcGhlIiwicm9sZSI6ImJiYiJ9XSwiZGlzY291bnRNb2RlbHMiOnsiY29uZGl0aW9uIjpudWxsfX0sIkRUQUciOnsic2lnbmF0dXJlcyI6W3siaWQiOiJzaWduYXR1cmUtMCIsIm5hbWUiOiJrb25nIiwicm9sZSI6ImFhYSJ9XSwiZGlzY291bnRNb2RlbHMiOnsiY29uZGl0aW9uIjpudWxsfX19fQ==",
  "creationDate": "2020-12-14T13:33:04.173Z",
  "lastModificationDate": "2020-12-14T13:44:09.219Z"
}

Where the "raw" data are the same "exchanging" data that you can re-use for document signing. or verification.

in above case. for eg.

hash = fd89f11b54964dd1fc6f75edda08ef57b487b1cd6082d8d5fc8f128c4e5ad3e8

which is what the value of sha256( DTAG0c85bf2582b89430d5fe6b346f3e254093d94521daa5ee9c8f18f6d9460ad2e3 )

storageKey 4b27216cf2d8e43555e5b7d781fac8d4d772aaea5889467c27c56b3992af81db

Logs from Blockchain-adapter.

> got storage key 4b27216cf2d8e43555e5b7d781fac8d4d772aaea5889467c27c56b3992af81db for MSP DTAG
> will store signature at key 4b27216cf2d8e43555e5b7d781fac8d4d772aaea5889467c27c56b3992af81db
> INCOMING EVENT: [DTAG] <STORE:DOCUMENTHASH> --> { "msp" : "DTAG", "eventName" : "STORE:DOCUMENTHASH", "timestamp" : "2020-12-14T13:33:04Z", "data" : { "storageKey" : "4b27216cf2d8e43555e5b7d781fac8d4d772aaea5889467c27c56b3992af81db" } }
> both parties stored data with ID 0c85bf2582b89430d5fe6b346f3e254093d94521daa5ee9c8f18f6d9460ad2e3

> reply: FetchPrivateDocument(#0c85bf2582b89430d5fe6b346f3e254093d94521daa5ee9c8f18f6d9460ad2e3) = {"fromMSP":"DTAG","toMSP":"TMUS","data":"eyJ0eXBlIjoiY29udHJhY3QiLCJ2ZXJzaW9uIjoiMS4wIiwibmFtZSI6IlRlc3Rpbmcgd2l0aCBHdWlsbGFtZSBhbmQgS3JpcyIsImZyb21Nc3AiOnsic2lnbmF0dXJlcy
I6W3siaWQiOiJzaWduYXR1cmUtMCIsIm5hbWUiOiJrb25nIiwicm9sZSI6ImFhYSJ9XSwibXNwSWQiOiJEVEFHIn0sInRvTXNwIjp7InNpZ25hdHVyZXMiOlt7ImlkIjoic2lnbmF0dXJlLTAiLCJuYW1lIjoiZ3VpbGxhbWUiLCJyb2xlIjoiYWFhIn0seyJpZCI6InNpZ25hdHVyZS0xIiwibmFtZSI6ImNocmlzdG9waGUiLCJyb2xlIjoiYmJiIn1dLCJtc3BJ
ZCI6IlRNVVMifSwiYm9keSI6eyJnZW5lcmFsSW5mb3JtYXRpb24iOnsibmFtZSI6IlRlc3Rpbmcgd2l0aCBHdWlsbGFtZSBhbmQgS3JpcyIsInR5cGUiOm51bGwsInN0YXJ0RGF0ZSI6bnVsbCwiZW5kRGF0ZSI6bnVsbCwicHJvbG9uZ2F0aW9uTGVuZ3RoIjpudWxsLCJ0YXhlc0luY2x1ZGVkIjpmYWxzZSwiYXV0aG9ycyI6bnVsbCwiVE1VUyI6eyJjdXJyZW
5jeUZvckFsbERpc2NvdW50cyI6bnVsbCwidGFkaWdDb2RlcyI6eyJjb2RlcyI6bnVsbCwiaW5jbHVkZUNvbnRyYWN0UGFydHkiOmZhbHNlfX0sIkRUQUciOnsiY3VycmVuY3lGb3JBbGxEaXNjb3VudHMiOm51bGwsInRhZGlnQ29kZXMiOnsiY29kZXMiOm51bGwsImluY2x1ZGVDb250cmFjdFBhcnR5IjpmYWxzZX19fSwiVE1VUyI6eyJzaWduYXR1cmVzIjpb
eyJpZCI6InNpZ25hdHVyZS0wIiwibmFtZSI6Imd1aWxsYW1lIiwicm9sZSI6ImFhYSJ9LHsiaWQiOiJzaWduYXR1cmUtMSIsIm5hbWUiOiJjaHJpc3RvcGhlIiwicm9sZSI6ImJiYiJ9XSwiZGlzY291bnRNb2RlbHMiOnsiY29uZGl0aW9uIjpudWxsfX0sIkRUQUciOnsic2lnbmF0dXJlcyI6W3siaWQiOiJzaWduYXR1cmUtMCIsIm5hbWUiOiJrb25nIiwicm
9sZSI6ImFhYSJ9XSwiZGlzY291bnRNb2RlbHMiOnsiY29uZGl0aW9uIjpudWxsfX19fQ==","dataHash":"fd89f11b54964dd1fc6f75edda08ef57b487b1cd6082d8d5fc8f128c4e5ad3e8","timeStamp":"1607952784306378603","id":"0c85bf2582b89430d5fe6b346f3e254093d94521daa5ee9c8f18f6d9460ad2e3"}

where

got storage key 4b27216cf2d8e43555e5b7d781fac8d4d772aaea5889467c27c56b3992af81db "dataHash":"fd89f11b54964dd1fc6f75edda08ef57b487b1cd6082d8d5fc8f128c4e5ad3e8"

Horizon-Developer commented 3 years ago

If we look close at what the current DocumentID currently is then it is functioning more as a salt for the invokingmsp because Key = H(MSP+DocumentID) which is a random 32byte slice. It also serves to hold together the documenthash and signatures.

It is there as a commitment between the parties in a way that you can only confirm it with your own signatures and hash if you know the random salt (ea the documentid).

From my point of view this is a system detail to check the correctness and validity of the claimed signatures. So when we put this behind an abstraction we do not loose the benefits from a shared ledger.

So maybe to get things a bit more clear (at least for me) @sschulz-t can you point me out what kind of functionality we loose by not showing or abstracting away the generated documentID? When I want to validate by hand that a document is onchain I need to get the documentID(salt) from the common api and then do all the verification by hand. If we would use the id direct then we would still need to do all the verifications by hand.

Also I think there is a little bug in the code as it is implemented now. because when a new docid is generated it is checked that there is no id with the same content, BUT the id is not used or reserved immediatly(as far as i can see) In a very theoritical situation it could be used in some other place before it is used in the current place.

zkong-gsma commented 3 years ago

currently. Not All "Resource" are meant to be "exchanged".
(Resource being. "Contract", "Signature", "Usage", "Settlement", "Discrepancy")

For the "resource" that we are doing to "exchange", yes we will have "documentId" per-say. (eg, Contract, Settlement).

But for "Signature", as a "resource". which we do not have a "DocumentID". we have to "abstract" one for it.

Based on what i understand from Pascal, We are not going to "exchange" the "Usage", "The" Usage is going to be "local" and only the "Settlement" is exchanged.

so, for "clean" "uniform" API design, hence it better/easier to have each "Resource" its "own" ID.

if we use "documentId" for "Contract", and "Settlement", what "ID" are we going to use for "Local" only "Usage" or "discrepancy" report?

This is why for the API uniformity each "Resource" will have its ID. and the "link" of its parent.

based on RESTful API design best practices

sschulz-t commented 3 years ago

As said before, I would argue to use the documentID for resources shared between partners where it is important to be able to verify that both are talking about the same thing. For data that is only local, there is a local ID. For the clean api this does not matter you have different rest endpoints anyway: /documents/ /usage/ There is (and should be) no such thing as /generic/ which gives you all kinds of different data for any id. As discussed, I will compile the pro and con list later and post it here.

Additionally I would like to collect some ideas on how to present IDs to the user in #14

zkong-gsma commented 3 years ago

as i mentioned before. documetID always kept and will be used for "verification" in a different way.

a different way of putting it. a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e.txt

why and how is it important to name a file that we share? eg. here, i have uploaded a "file" "a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e.txt"

why is it that important that we all parties MUST store and name this file. "a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e.txt"?

where i think each recipient should is allowed to store as a different name.

what "Real" benefit that we both has a "hard" to read name? eg, the same file, renamed to be "contract.txt" contractId.txt

$ md5sum a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e.txt contractId.txt
1e4d899258fb7dab62f1d15fd3f7a2c7  a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e.txt
1e4d899258fb7dab62f1d15fd3f7a2c7  contractId.txt

documentId to me is technically the same as a md5sum

so when we exchange Files with parties., the documentId is unique and never changes (and based with payload. if we change payload, the value will change.)

this is how we "identify"/"sync" from both parties that they are the same file.

we do not need to name them the same name.

as long as we have the same file name, finding them can be anything they contain. eg.

$ grep "Test File" ./*
./a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e.txt:This is a Test File.
./contractId.txt:This is a Test File.
$
$ grep "a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e" ./*
./a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e.txt:my documentId is a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e
./contractId.txt:my documentId is a248d3a74c66fad90ec230a7a35342e24df459040d7c68800d63826a753b043e

and lastly, once the "file" is found. verification can be done by comparing "documentId" per-say, which is the "md5sum" part show above.

sschulz-t commented 3 years ago

We have two possible solutions:

Option A: having local contractID and documentID Option B: only having a documentID

	PRO	CON
Option A (2IDs)	- decoupling of chaincode and application - clean and consistent RESTful Structured API - support for structed local documents that will not be exchanged.(where you do not have documentId)	- adds overhead - can potentially confuse developers - harder to debug (manual lookup of mapping)
Option B (1 ID)	- clean and consistent api - documents can be cleary and verifiable identified by both parties (phone etc)	- modification to blockchain to reserve documentId adds uncessary overhead to the Network. - Unnecessary "additional" work - Leaky abstraction of an id that belongs to the chain(GW)

I tried to sum up all points in the table. It's hard to extract one-line pro/cons from this long discussion. @zkong-gsma please summarize your points and add them to the table.

@all: If you want to add points, please click on [...] and choose edit on THIS message and modify it right in place (there is a history, don't worry to break stuff ;) )

zkong-gsma commented 3 years ago

I would like to question what kind of "Overhead" do you see that the Option A is adding?

Also,

can potentially confuse developers
harder to debug (manual lookup of mapping)

These are hypothetical problems that we do not generally "see"/"face". we are only confusing the "developer" when we calling it "documentId". as per-say. This is technically not really an "id" and if we stop calling this a "documentId". then there will be no confusion.

As you mentioned. "its harder", not "impossible". to debug. and debugging can still happen eventually when something unexpected happen. We do not expect a "production" deployment to face this all the time.

also, there is currently no "different" in ways we "stores" the "document "per-say"

             'CREATE TABLE IF NOT EXISTS documents (' +
              '`id` INT AUTO_INCREMENT, ' +
              '`documentId` VARCHAR(128) NOT NULL, ' +
              '`fromMSP` VARCHAR(64) NOT NULL, ' +
              '`toMSP` VARCHAR(64) NOT NULL, ' +
              '`data` json NOT NULL, ' +
              '`state` VARCHAR(64) NOT NULL, ' +
              '`ts` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, ' +
              '`fromStorageKey` VARCHAR(64) AS (SHA2(CONCAT(fromMSP, documentId), 256)) STORED NOT NULL, ' +
              '`toStorageKey` VARCHAR(64) AS (SHA2(CONCAT(toMSP, documentId), 256)) STORED NOT NULL, ' +
              'PRIMARY KEY (id), ' +
              'UNIQUE INDEX documentId (documentId))');

You "LocalStorageAdapter" stores the "document exactly the same as how we store. WE "indexes" our "ID" and "documentID" exactly how you guys are currently doing. if you believed in the "documentId", why is it not the "primaryKey"? why do you still need a "id" autoincrement?

> db.contracts.getIndexes();
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "unique" : true,
                "key" : {
                        "id" : 1
                },
                "name" : "id_1"
        },
        {
                "v" : 2,
                "unique" : true,
                "key" : {
                        "documentId" : 1
                },
                "name" : "documentId_1",
                "partialFilterExpression" : {
                        "documentId" : {
                                "$type" : "string"
                        }
                }
        }
]

and your API. getDocument

async getDocument(documentId) {
    let rows;
    try {
      rows = await this.getDatabase().query('SELECT * FROM documents WHERE documentId=?', documentId);

is also returning an "abstracted" ID that cannot Be synced.

Another words, they will go "off sync" once you have more "contracts" with different Partners. This ID is currently Displayed under "Contract" Detail view under the "Header"

sschulz-t commented 3 years ago

I would like to question what kind of "Overhead" do you see that the Option A is adding?

Keeping track of two IDs, thinking where and when to use which one, translation between ID1 and ID2,...

can potentially confuse developers

harder to debug (manual lookup of mapping) These are hypothetical problems that we do not generally "see"/"face".

That's not hypothetical. I can assure you things that can potentially go wrong will go wrong. That's what you learn from years of developing software.

As you mentioned. "its harder", not "impossible". to debug. and debugging can still happen eventually when something unexpected happen. We do not expect a "production" deployment to face this all the time.

We have the possibility to avoid potential pitfalls at no cost, why shouldn't we do it? Making things harder or more complicated will cost a lot of time in the future. That's what I would like to avoid.

I would not count "needs to be implemented" as a con point and "it is already implemented that way" as a pro. We will need this documentID reservation feature anyway as this will fix the bug Gerben mentioned above.

As said before, I am fine with using local IDs for non-exchanged documents. So both systems can support structured local documents that will not be exchanged.

zkong-gsma commented 3 years ago

I would like to question what kind of "Overhead" do you see that the Option A is adding?

Keeping track of two IDs, thinking where and when to use which one, translation between ID1 and ID2,...

we didn't want to reveal the "documentId" to start with. as per mentioned. that is also how you guys were storing the doucments on your "LocalStorageAdapter".

also, where is the "overhead" per-say. Who is taking the "impact"? from the Client perspective. in this case. webui. They do not need to keep track of the 2. for the "webui" its all just 1 ID. There is no "overhead" between webui <-> common-adapter.

and its common-adapter job to link/arrange multiple items into a RESTFUL way. You cannot avoided "linking" of objects anyway. as we talk about multiple other resources.

we have contract.
a contract then have usages.
it then have "settlement", that also links to "usages".

so, if we are going with your original "documentId" design.

GET http://{host}:{port}/api/v1/blockchain/documents?type=contract
GET http://{host}:{port}/api/v1/blockchain/documents?type=usage
GET http://{host}:{port}/api/v1/blockchain/documents?type=settlement

how do we then find/select a usage/settlement? where it now suddenly becomes

GET http://{host}:{port}/api/v1/blockchain/documents?type=usange&documentId={documentId for "contract"}
//give a list of "documentId" related to the "contract's documentId.
GET http://{host}:{port}/api/v1/blockchain/documents/{usage's documentId}

so, we now have a mixture of 2 documentId to "select" from? isn't this now becomes more confusing? documentId for contract = "abcd1234" documentId for usage = efgh5678. but related to documentId abcd1234

at this point, which documentId is which documentId? and if we say we going to exchange "usages" as well as settlements, now we have 3 documentIds. isn't this now even more confusing that you now need to tag. documentId?type=contract documentId?type=usages, where contract's documentId documentId?type=settlement where contract's documentId and usage documentId is

isn't this more confusing to the developer? this is why we hide and avoid using the term "documentId" in the first place.

can potentially confuse developers

harder to debug (manual lookup of mapping) These are hypothetical problems that we do not generally "see"/"face".

That's not hypothetical. I can assure you things that can potentially go wrong will go wrong. That's what you learn from years of developing software.

Developing "software" is not the same as "developing" API. We are defining the "RESTFUL" API here. All major Company design their API in the same "RESTful" way.

As you mentioned. "its harder", not "impossible". to debug. and debugging can still happen eventually when something unexpected happen. We do not expect a "production" deployment to face this all the time.

We have the possibility to avoid potential pitfalls at no cost, why shouldn't we do it? Making things harder or more complicated will cost a lot of time in the future. That's what I would like to avoid.

I would not count "needs to be implemented" as a con point and "it is already implemented that way" as a pro. We will need this documentID reservation feature anyway as this will fix the bug Gerben mentioned above.

As said before, I am fine with using local IDs for non-exchanged documents. So both systems can support structured local documents that will not be exchanged.

And a Fresh View of Picture. Common Adapter is a "Local" instance. "Blockchain-Adapter" is a "global" system. Untitled Diagram

it is not logical to have a perceive same document, but not a same document from a "technical" perspective. eg.

GET http://dtag.poc.com.local:3030/api/v1/contracts/0b9097ecc45af99ec7dff7a1d8fb9b267d72fea9940051ee4e0a9c4bb0a6205c and GET http://tmus.poc.com.local:3040/api/v1/contracts/0b9097ecc45af99ec7dff7a1d8fb9b267d72fea9940051ee4e0a9c4bb0a6205c do not result to the same "Object" as a "whole"

vs, we expect http://dtag.poc.com.local:8081/private-documents/0b9097ecc45af99ec7dff7a1d8fb9b267d72fea9940051ee4e0a9c4bb0a6205c and http://tmus.poc.com.local:8082/private-documents/0b9097ecc45af99ec7dff7a1d8fb9b267d72fea9940051ee4e0a9c4bb0a6205c this to return the "exact" same response/result. because this 2 system are in sync. eg

This technical differences branches out to other items as well that is local to the Org, such as "local metadata", or "dispute report", or even "signature" where technically have a "txId" instead of a "documentId" for eg. Technically below should produce a "same" list. but it can be different list. as we support can and will support local draft copies. GET http://dtag.poc.com.local:3030/api/v1/contracts/0b9097ecc45af99ec7dff7a1d8fb9b267d72fea9940051ee4e0a9c4bb0a6205c/usages/ and GET http://tmus.poc.com.local:3040/api/v1/contracts/0b9097ecc45af99ec7dff7a1d8fb9b267d72fea9940051ee4e0a9c4bb0a6205c/usages/

Horizon-Developer commented 3 years ago

One thing I want to add is that in my opinion the name 'documentID' and the fact that it is seen as global identifier that always points to the same content is not correct. And that is because as said in a previous comment documentID is just a random seed and is not itself tied cryptographic to the content of the document. It only serves as a common (peer2peer) agreement on the storage-key.

So even if we have the same documentID it doens't universally hold that we talk about the same content (for that we need to go onchain and verify the hash in the corresponding storagekey of the other msp). So if our goal is to identify a document then a much user friendly way is to have a globalname + version + date this has the same attribute as a 'documentID' in the sense that we need to go onchain to see the proof of correctness.

On the other hand if we want to make really sure that on the phone we talk about the same content then there is only on way and that is share the hash value with each other.

So in my opinion these kinds of id's should never be showed to the user anyway.

Then the argument about developers and possible confusion. the common-adapter is an abstraction boundry that means when I am busy developing on something that consumes the common-adapter I only need to worry about the localid because the rest is abstracted way so no confusion there. When I am working on something that consumes the blockchain(adapter) for instance the common-adapter itself then I only need to worry and check the documentID which is a detail of the blockchain implementation.

sschulz-t commented 3 years ago

Gerben, you are right, the current documentID is a PSK that allows parties that know it to calculate the storage keys. The idea of this design was to have an on-chain way to link all documents together. This linkage can be proven by revealing the documentID to third parties. Regarding using the hash: We could (theoretically) have the same document uploaded twice, a document hash alone would not help in this situation.

So maybe documentID is the wrong name here, maybe caseNumber is a better fit? What I was talking about is that I want to make sure that both parties talk about the same "case". In my opinion, having such a caseNumber, that links all data (document, settlement, signatures, ...) together is mandatory.

If the human readability is important, one can think of allocating such a world-wide unique case identifier differently. E.g. something in the form of < origin msp id >-< year >-< month >-< day >-< counter >. This could be reserved on chaincode level and can be used to identify one case throughout all APIs. The disadvantage is that this approach requires to pass the psk AND the new identifier to some operations in order to be able to retrieve some contents. Maybe something in the form of < origin msp id >-< year >-< month >-< day >-< counter >-< psk > could also be used when the chaincode ensures that < origin msp id >-< year >-< month >-< day >-< counter > is already unique and the psk just adds enough entropy for the hidden communication.

So maybe we should rephrase the question in this thread to whether we need a world-wide unique case identifier that refers to a specific caseNumber and that this number is the way to identify a specific case on both ends. How this case number is generated (random psk, human readable, ...) should be the second decision.

@Kong: If we redesign the case handling in a way that the object allows to differentiate between local and shared data I do not see a problem with the same url reporting different data on both msps. Having a clear distinction between local and shared data would be a nice side effect.

zkong-gsma commented 3 years ago

I think the discussion is getting out of topic again.

The original question is still Should we be using the same "ID" (currently refers as documentId) as the identifier or should we use a new "abstracted" local ID which we currently refers to "ContractId"

Renaming the "documentId" to anything has not impact at all, and i have keep saying that this "documentId" to some degree is an "extended" md5sum per-say of the "document" and we should not be using it as the "Id" (selector)

goes back to my point. even if we rename, when we come to its nested document, how are we going to handle it?

now that we have 2 ID of the same kind per-say.

sschulz-t commented 3 years ago

As discussed yesterday, we will rename documentID to referenceID. I will close this issue now.

Please see these three tickets for details/progress: https://github.com/GSMA-CPAS/BWRP-common-adapter/issues/15 https://github.com/GSMA-CPAS/BWRP-blockchain-adapter/issues/16 https://github.com/GSMA-CPAS/BWRP-chaincode/issues/26