Open StephenCzarnecki opened 3 years ago
I believe I can answer all questions with one clarification. Right now, the way the GraphQL API is designed, with a GraphQL query, you cannot ask for specific values within the resource holding the actual data (other than the values that are "mirrored" like nearnormalHemisphericalTransmittances
; but these values are not meant to make approving institutions select specific values to approve). In other words, an approval is for all data and could merely exclude some meta information about the data like the description
field (you could exclude that from the GraphQL query). The resource itself is indirectly included in an approval through the resource's SHA256 hash value (this one should be part of every query used in an approval).
If we need to be able to approve only some values within the resource, then the current approach to approvals does not work and we need to think of something else.
Also what is missing from approvals right now is a statement. So what does an approval by, for example, NFRC mean? This could be implicit but maybe it is better to make such a statement explicit.
If you feel that one of the questions is still open, then please let me know and I'll give it another try.
@simon-wacker Thank you, I think your clarification helps immensely. Let me try to create an example running through the data approval process to make sure. Again please correct any of the following.
Say that the NFRC wishes to approve the data returned by this url:
https://igsdb.lbl.gov/api/v1/products/363
Side note: currently that url returns product data in the existing IGSDB json format. But in the future the IGSDB may add something like a format option in the query string to allow for returning optical data that conforms with the ICON opticalData.json schema like
https://igsdb.lbl.gov/api/v1/products/363?format=ICON
Or potentially implement an entire graphQL api for the data. Since this is uncertain and does not exist yet for this example I would like to just use the existing url as the resource.
End side note.
In any case for this example assume that there is data including measured wavelength data behind https://igsdb.lbl.gov/api/v1/products/363
Data Approval step 1 (An institution adds data to a database):
Data approval step 2 (Some institution (may be the same) queries the data with a GraphQL query.):
query {
data(
id: "12ba0b75-a6a5-424b-a01f-7aae665482ac" # IGSDB UUID for the record to be approved
timestamp: "2021-02-21T12:00:00-08:00" # Timestamp for the time NFRC wishes the approval to begin
locale: "en-GB"
) {
name
warnings
resources {
hashValue
locator
}
}
}
Data approval step 3 (The latter institution reviews the data and, if correct, signs it with one of its GnuPG signing keys.):
{
“data”: {
“name”: "Generic Clear Glass",
“warnings”: [],
“resources”: [
{
“hashValue”: “bca45733b010c3b0b8f940dd7f878ce9a679210d449768fa4dd55692684b64db”, # SHA256 hash of the body of the response from the locator url
“locator”: https://igsdb.lbl.gov/api/v1/products/363/
}
]
}
}
Data approval step 4 (The institution adds its approval of the data to the database.)
After step 4 is complete the IGSDB is able to generate the following DataApproval response for UUID 12ba0b75-a6a5-424b-a01f-7aae665482ac when requested:
{
“timestamp”: "2021-02-21T12:00:00-08:00",
“signature”: “iHUEABEIAB0WIQQg8IURMSKBL/r7WqqDBHuHYt2u2AUCYDlItAAKCRCDBHuHYt2u2H3PAP9D+JCzwHdCfKqRX9n0zm1qwiqWNwfTEE5xVJz2aJff2gEAtpSU0YBrSXmRwWuAhwb9iSxzGkacFac4D7hy7q2PQ0E==fDo4”,
“keyFingerprint”: “15E4544A88EEB81EAF65229038CEC5E499AE24A9”, # Assume that this is the fingerprint for the NFRC signing key.
“query”: “query{data(id:"12ba0b75-a6a5-424b-a01f-7aae665482ac",timestamp:"2021-02-21T12:00:00-08:00",locale:"en-GB"){name,warnings,resources{hashValue,locator}}}”,
“response”:“{“data”:{“name”:"Generic Clear Glass",“warnings”:[],“resources”:[{“hashValue”: “bca45733b010c3b0b8f940dd7f878ce9a679210d449768fa4dd55692684b64db”,“locator”:https://igsdb.lbl.gov/api/v1/products/363/}]}}”
}
Regarding the question
Is this the same timestamp as the timestamp in the query in step 2?
in step 4: Yes, it's just for convenience, so that it need not be extracted from query
property of an approval. The timestamp is for example needed, when a person wants to query for example the data format from the metabase (IKDB) as it was at the time the approval process took place.
I would require data id
and timestamp
to be included in the query in step 2. Otherwise, the GnuPG signature does not associate resource data with the unique data identifier and a specific time (note that the meta information about the approval, that is, the JSON you posted in step 4, is itself not signed). So, not including id
and timestamp
could be problematic because in that case the signature of the approval could also be used for another data record with a different id
and another timestamp
.
In the comment Approval#query
it actually says
It does neither include other data approvals by third parties nor the response approval by the database. All other fields and sub-fields of this GraphQL schema at the time given by
timestamp
are included. Despite these restrictions specifying the query explicitely is necessary because approvals shall not become invalid when the GraphQL schema changes.
By changes here I meant non-breaking changes like adding additional fields or renaming a field by marking the old one as deprecated and adding a new one with the new name. Similar to what I said above, the rationale behind that requirement was that not only the data itself is signed but also how it came about by including appliedMethod
, who measured or simulated the data by including creatorId
, which exact format it is in by including formatId
, and so forth to make sure that the same signature cannot be used maliciously as approval of the resource data with reference to another data formatId
than in the original approval which could change the whole meaning of the signed data.
Again, I hope these somewhat confused explanations make sense (my mind is rather unfocused these days).
Oh, and what I forgot: The example is exactly how I envisioned approvals to be created. There are also some explanations on other aspects in the comment Approval
, in particular, some best practices for databases on checking approvals before adding them. And in the fields of the interface Approval
further explanations, for example, on how to compare responses.
And, I'm happy if you see any shortcomings, vulnerabilities, and what not. No other computer scientist has taken a thorough look on those ideas so far.
@simon-wacker Thank you for your additional clarifications, they are again very helpful. I will not have time to put together an updated example before the meeting tomorrow but have one (hopefully quick) question that came up during a discussion of your posts. And on reflection maybe this deserves its own issue, not sure.
How might a change of version of the optical data impact the approvals? For example lets assume that the IGSDB resource used by ICON is returning data based on the ICON opticalData.json schema. And that response is what is used in the DataApproval.
Then at some point in the future the opticalData.json schema changes. Presumably IGSDB would implement those changes to remain compliant. But would that then invalidate the existing approvals and require new ones to be created?
To put this in some context we recently ran into a similar issue that caused some extra work. We have some THERM files that are used by Radiance to create some BSDF results. Calculating those BSDF results takes hours per file. To prevent the need for recalculating those results are signed with a hash of the THERM files used to create them. Without going into too much detail there was a recent change in THERM that did not affect any results but ended up requiring the recalculation of all of the BSDF results. And while we fixed it by rolling back part of the change to no longer require recalculating existing files it did leave us with a renewed appreciation to issues regarding signing and versioning data.
Another reason for asking this is the existing IGSDB REST API at least nominally includes a version in the url. It is the “v1” in the example https://igsdb.lbl.gov/api/v1/products/363
We have briefly discussed a couple different approaches for being able to serve data in the optical data based on the ICON opticalData.json schema. Either adding a format flag to the existing api like
https://igsdb.lbl.gov/api/v1/products/363?format=ICON
Or creating a new API path like
https://igsdb.lbl.gov/api/ICON/products/363 or (with version) https://igsdb.lbl.gov/api/ICON/v1/products/363
But if any of those are the locator in the resource returned by the DataApproval query created in step 2 like
“resources”: [
{
“hashValue”: “bca45733b010c3b0b8f940dd7f878ce9a679210d449768fa4dd55692684b64db”, # SHA256 hash of the body of the response from the locator url
“locator”: https://igsdb.lbl.gov/api/v1/products/363?format=ICON
}
How will those approvals be handled if the opticalData.json schema changes? Because if the opticalData.json schema changes then the response from the locator query may no longer match the stored hashValue.
In terms of the two approaches we have discussed so far for returning data in the ICON format (format query string parameter vs different url) the reason was to attempt to provide data conforming to the ICON schemas without having to implement a completely parallel GraphQL API along side the existing REST API. The thought was it might be easier to meet the timeframe if IGSDB could “simply” have an additional serializer that conformed to the ICON schema.
Then the IGSDB GraphQL implementation could be limited to what is described in database.graphql. It may be the case in the future that the IGSDB moves to a full GraphQL implementation. But the thinking is that it may be easier in the present to have a limited GraphQL implementation plus the ability to serialize data in the ICON format than to maintain two complete parallel APIs.
However it initially seems that, regardless of the approach, the DataApproval process depends on the version used at the time of approval which may become deprecated and potentially removed in the future. And we are wondering how that may be handled.
Yes, if the schema changes in a non-backwards compatible way and the data is transformed to conform to the new schema, then the hash value changes and all approvals would need to be recreated to still be valid.
Scenario 1: The schema changes in a backwards compatible way, for example, some new non-required property is added or some property is deprecated but still left in the schema. In that case nothing needs to be done. In that case only the minor or patch version of the schema would change. I would version the schema according to semantic versioning.
Scenario 2: The schema changes in a non-backwards compatible way, for example, a property is removed or renamed or made required or ... In that case, I would give the schema a new major version by changing its $id
. Note to myself: I need to add the major version to the $id
. Something like https://www.buildingenvelopedata.org/schemas/v1/opticalData.json
.
formatId
in the resource meta information returned by GraphQL. The URL of the resource should still return the data conforming to the old schema. All approvals are then still valid because the hash value did not change.This requires the format
query parameter to be versioned (or an additional parameter version
) and for the backend to be able to return data conforming to an old major schema version. For example, https://igsdb.lbl.gov/api/v1/products/363?format=ICON&version=1
.
We have some questions about the DataApproval type that seem like they deserve their own issue. This is mainly discussing the DataApproval in particular but should hopefully extend to the Approval type as well since the DataApproval is an implementation of Approval. Currently focusing on just the DataApproval section however since the IGSDB has the use case of offering NFRC and AERC approvals.
These initial questions revolve around the query and response fields. First to quote from the database.graphql document about the process for institutions to approve data:
Next a concrete example from the IGSDB. Please correct any of the following if it is mistaken in terms of the DataApproval process.
Currently the IGSDB has two potential institutions that can be sources of approval for data in the IGSDB: NFRC and AERC. Each institution is free to potentially define which pieces of data are used as part of their approval. So if some record is approved by both the NFRC and the AERC the fields used by each institution may not be the same.
And, more granularly, each is potentially free to define which data are used for different product types. So AERC may potentially require some data for say woven shades and different data for perforated screens.
These specific fields that each institution requires for each product are stored in the query field for that specific approval for that specific record. So even if an institution changes the data used in its approval process it can still be verified that previous approvals were valid because the fields that were used at the time are recorded in the query.
Next some questions, assuming that the above example is at least mostly correct.
Question 1: It seems the query mentioned in step 2 of the approval process will include data that is not used by the ICON metabase itself. Because step 3 is the institution’s review process and that may involve other data. For example the NFRC approval process involves the measured wavelength data. That is data present in the IGSDB and can be retrieved by using the locator field in the resources section of the OpticalData response type. But that measured data is not actually used by the ICON metabase. Is this correct?
Question 2 (if question 1 is correct): Since the query that is stored for the review process is a GraphQL query, and that may contain data that is not used by the ICON metabase, does this mean that the client databases must implement a GraphQL API for at least the data required by the approving institutions?
If the previous example is correct this seems to be the case. Because even though, as @simon-wacker noted in issue #186, the ICON metabase will never actually execute the query itself IGSDB still needs to implement it in order for this DataApproval functionality. For example if the AERC approval process involves the BSDF data then it seems the IGSDB will need to provide GraphQL access for the BSDF data.
Question 3 (if question 1 is correct): Is the data stored in the response field the json returned by executing the query? If the query does involve measured data then this response is potentially quite large and seems to be the sort of data that the ICON metabase did not want to deal with. To continue the above example if the AERC approval process involves the BSDF data then would that be contained in the response field as well?
Finally an attempt to create an example approvals field in an OpticalData object returned by the IGSDB for most general case currently possible: A record that is approved by both the NFRC and the AERC (even if such a case does not currently exist). Depending on how much of the above is mistaken please feel free to disregard.