Closed dakbhavesh closed 4 months ago
If we to support this kind of feature officially, ignoredFields should be added to hash url
i.e.
ni:///sha-256;b0c8cf9e9a733fb18a0134bfb0f64e701432c6bebda06124a1e62f087d71a333?ver=CBV2.0&ignoreFields=captureID,hash
Otherwise it's not possible to regenerate hash for verification purposes. This request doesn't seem to be completely thought through. Maybe it's better to not allow any additional fields in events returned witth EPCISQueryDocument Only fields that are excluded from hash generation by default should be allowed to be added in query (eventID, recordTime and error declaration related one).
@RalphTro what's your take on this?
Hi @dakbhavesh and @sboeckelmann,
When Bhavesh initially suggested to tackle this issue through an additional parameter (which I think is a an appropriate way!), I was thinking of a command line parameter (similar to e.g. "-j '\n'" if you want our module to return all key-vale pairs as part of the pre-hash string on a new line for better readability). Note that an eventID
field can also be populated by a UUID, or even be empty. So, with that in mind, what is your take on this alternative solution variant?
Having this parameter is not necessarily wrong. But the root cause, which led @dakbhavesh to making this feature request (repo specific fields) must be taken care of by the repo itself and not resolved by adding a feature to the hash generator. ;-) Because the repo in this case, simply doesn't return the exact event that was captured and used for generating the hash id. I will pick this up in a separate discussion with @dakbhavesh
Dear @RalphTro & @sboeckelmann ,
Posting here the proposed idea we brainstormed for reference:
{
"@context": [
"https://ref.gs1.org/standards/epcis/2.0.0/epcis-context.jsonld",
{
"example": "http://openepcis.io/schema.jsonld", // vendor-specific schema
"hashGen": "http://some.well.known.domain/schema.jsonld"
}
],
"hashGen:ignoreFields": ["hashGen:hash", "example:captureID"],
"example:captureID": "",
"hashGen:hash": "",
.....
}
The above structure ensures that EPCISQueryDocument remains JSON-LD compliant.
Incorporating the above will ensure consistency of hash between captured and query document which is the goal for raising this issue.
Dear @dakbhavesh ,
Many thanks! Based on your ideas, I just compiled a concrete example we could use for illustration purposes. Voilà:
{
"@context": [
"https://ref.gs1.org/standards/epcis/2.0.0/epcis-context.jsonld",
{
"example": "https://ns.example.com/epcis/",
"hashGen": "https://some.well.known.domain/schema.jsonld"
}
],
"type": "EPCISQueryDocument",
"schemaVersion": "2.0",
"creationDate": "2005-07-11T11:30:47.0Z",
"epcisBody": {
"queryResults": {
"subscriptionID": "32d2aec1-a6d2-46d9-900a-24124288cce1",
"queryName": "SimpleEventQuery",
"hashGen:ignoreFields": ["hashGen:hash", "example:captureID"],
"resultsBody": {
"eventList": [
{
"type": "ObjectEvent",
"eventTime": "2019-10-21T14:45:00Z",
"recordTime": "2023-06-02T07:03:16.073Z",
"eventTimeZoneOffset": "+01:00",
"eventID": "ni:///sha-256;b3d481c5590757ab8361a6cb0e924820feb3b61eb568e7d7703f21a96f76f51d?ver=CBV2.0",
"epcList": [
"https://id.gs1.org/00/040123451111111127"
],
"action": "OBSERVE",
"bizStep": "inspecting",
"disposition": "in_progress",
"readPoint": {
"id": "https://id.gs1.org/414/4012345000245/254/1"
},
"hashGen:hash": "ni:///sha-256;b3d481c5590757ab8361a6cb0e924820feb3b61eb568e7d7703f21a96f76f51d?ver=CBV2.0",
"example:captureID": "cc90bbe4-7f3a-4abd-b150-d4371c72a64f"
}
]
}
}
}
}
Does this resonate with you?
Dear @RalphTro , In your example above, shall we keep the values of "eventID" & "hashGen:hash" the same to avoid confusion?
Just a few thoughts:
And yes, @dakbhavesh - I also think, if hash is used for eventID, values should match. Unless we want to add some really funky edge-case example with eventTime, recordTime being ignored, which can be a valid use case for identifying certain duplication issues. But that's fetching too far and would cause a lot of confusion 😄
Hi @sboeckelmann,
I agree with your idea to ignore hashGen:* fields by default.
For your second point, I am curious, what benefit we will gain by providing an array of event hashes? Wouldn't businesses agree beforehand on supported algorithms?
@dakbhavesh : I think I now finally understand the purpose of the field hashGen:hash
: I am correct in assuming that this conveys the event's hash value e.g. to cater for cases where an event does not have an eventID
field or the latter is populated with a UUID? (Yesterday, I thouht that this hash also takes fields into account that should be ignored; that's why the values were different :-))
Happy to adjust it.
@sboeckelmann: As to "let's have the hash generator(s) always ignore hashGen related attributes. Which means there would be no need to include them in hashGen:ignoreFields, they shall always be ignored by default.": how should an accessing application know about that? As it is not officially specified anywhere, wouldn't you agree that it was better to make this more explicit?
As to "what do you think about being able to provide an array of event hashes": I agree with Bhavesh: what is the benefit of this? If there is a need for going for another sha, a company/industry consortium can choose to use it in the eventID
field today.
@RalphTro : always ignore hashGen related attributes The reason why I was suggesting not to add hashGen extension related fields explicitly is, because hashGen will only be active if
which means these fields don't show up automagically. A client will always be aware of hashGen related implementation details. I do understand that calling them out explicitly may be helpful and I wouldn't be opposing against it. However, there are already a couple of fields being implicitly irgnored by the hash-generator implementation(s) (eventId, recordTime and error declaration). If we want to be 100% explicit about ignored fields, these fields should also be taken into consideration. My personal opinion is, that fields that are anyway always or implicitly ignored by the hash generator may not have to be included in ignoreFields, because clients will be aware of the hashGen extention's implementation details.
@RalphTro , @dakbhavesh array of event hashes For the initial vanilla use cases it may seem correct that static setting of hash algorithm may be sufficient. Especially when having in mind, that most commonly we will see usage of sha256. I think we are safe to support single hash value only if a client can request a specific algorithm by declaring it in a specific HTTP Request Header. I am not convinced that EPCIS query request being made will only be targeting for a single company, industry or consortium. Just think of a transport agency that wants to provide EPCIS services to their customers. Their customers may be coming from a wide variety of industries, having different hash algorithm requirements. In the future we may have even more EPCIS services, that are not bound to a single industry. If EPCIS 2.0 Semantic Web and Linked Data is being picked up and adopted by the industries in such a way that I am hoping for, there shouldn't be a built-in limitation.
As to hashGen fields: I got your point. If the client specifically requests these fields, I tend to agree with you. (+ considering to add "automagically" to my vocabulary ;-))
As to event hashes array. TBD/need more time to think this through. How do you like the idea to wait at least for a first implementation in industry where multiple hash algorithms are required?
Re: multi hash array We can keep it as a single hash for now. But we have to add some functionality for the client to set desired algorithm.
Dear @dakbhavesh and @sboeckelmann , Suggestion for tomorrow's call: let's try and implement support for this feature. :-) Here is how it should work from my POV:
ignoreFields
field, e.g. under the example
namespace in this/our reference implementation - though we should point out that this should be configurableignoreFields
. Report back to the Visibility SMG about this matter once we have implemented it.
WDYT?
Report back to the Visibility SMG about this matter once we have implemented it.
WDYT?
I like the idea of reporting back to the SMG, let's discuss with Craig tomorrow. The EPCIS Event Hash get more attention and awareness from the SMG.
I just got back in the context of the topic :)
As we agree with the approach, I believe I can pick this up for adjusting logic to ignore the fields specified in ignoreFields
.
Working logic may not be present by tomorrow however I will try to get it working as soon as possible, most likely in a week's time. Would that be okay?
Dear @dakbhavesh , This would be fantastic! Many thanks for your kind offer to advance this. I am happy to support through testing. Looking forward to speaking with you tomorrow!
Dear @RalphTro , @sboeckelmann ,
Summarising our discussion here:
"hashGen:ignoreFields": ["example:captureID"]
. By default, all hashGen fields will be excluded.hashGen
is provided differently in context. e.g. "hg:ignoreFields": ["example:captureID"]
.Please feel free to add any missing points on top of the above. Also, feel free to ask in case of any queries :)
Dear @dakbhavesh ,
This sounds like a great summary. Agreed + thanks!
To enable better understanding ('hashGen' may be confusing to some people ;-)), we may could use a different URI prefix than 'hashGen', e.g. 'repository-x:ignoreFields' (tied to e.g. 'repository-x.example.com'). This might convey of what we have in mind in a better way, as we could explain that fields such as 'captureID' were added by a fictitious EPCIS repository solution named 'repository-x'.
What is your take on this?
Kind regards,
Ralph
Hi @RalphTro, I agree with your suggestion to keep the prefix and URL as you suggested. It is a better alternative than some technical name like hashGen
.
Dear @RalphTro, I am almost there with adjusting logic. Thought to keep you posted on progress. I am going to create a PR soon once sufficient testing and test cases are in place. Thanks for your patience :)
Dear @RalphTro, I have adjusted the logic to incorporate support for ignoring fields from EPCIS XML document and query documents. Please have a look at PR and feel free to give it a try.
Regarding @Echsecutor proposal to add the ignored fields into the NI. I think we should take it up separately as it may make the generated hash look complex.
For example, we need to use a fully qualified name in NI for the ignored fields <example:testField1/> & <example:testField1/>
ni:///sha-256;b3d481c5590757ab8361a6cb0e924820feb3b61eb568e7d7703f21a96f76f51d?ver=CBV2.0&ignoredFields={https://ns.example.com/epcis/}testField1,{https://ns.example.com/epcis/}testField2
Please share your thoughts.
FYI: @sboeckelmann , @Echsecutor
I suggest that we keep this issue open until we have reached consensus in the course of our next alignment call.
For now, reached consensus. Additional thought: if EPCIS events need to be processed stand-alone afterwards, the repository-specific fields (indicated in ignore-fields
) need to be redacted/stripped off before passing them further.
Notwithstanding the latter, in an EPCIS query doc, it makes more sense to actually convey that information once, i.e. in the document (no in each and every event).
While adding support for generating hash from EPCIS Query Document we observed that query document may be enriched with repository-specific fields like "hash", "transactionId", "captureID", etc. To ensure the consistency of generated hash must ignore such fields. Such fields are hard to know in advance as each EPCIS repository may behave uniquely and have its own nomenclature. Thus, it would be nice to add a parameter that can be used to provide fields to ignore externally.
The parameter may look like below:
Name: ignoreFields Values: comma separated name of the fields to ignore
--ignoreFields hash,captureID