RalphTro / epcis-event-hash-generator

ALGORITHM and SOFTWARE PROTOTYPE to uniquely identify/validate the integrity of any EPCIS event through a common, syntax-agnostic approach based on hashing. Takes an EPCIS Document (formatted in either XML or JSON-LD) and returns the corresponding hash value(s).
MIT License
8 stars 4 forks source link

ID length normalization #112

Closed Echsecutor closed 7 months ago

Echsecutor commented 8 months ago

Currently the DL normalization of GS1 IDs does not take length differences (leading zeros) into account.

I added an example for an event which uses some id, e.g.

<epcClass>https://id.gs1.org/01/4064074123453/10/245</epcClass>

and the same event writing the same id as

<epcClass>https://id.gs1.org/01/04064074123453/10/245</epcClass>

This currently leads to different hashes, but in my oppinion should not

See the failing test in https://github.com/RalphTro/epcis-event-hash-generator/pull/113

RalphTro commented 8 months ago

Dear @Echsecutor , Thanks for bringing this to the table. Understood your point. Indeed - the canonical GS1 DL URI embedding a GTIN MUST represent the GTIN in its 14-digit format. Curiously, the normaliser module should take care about that: https://github.com/RalphTro/epcis-event-hash-generator/blob/master/epcis_event_hash_generator/dl_normaliser.py

I just made some tests, and added the following sample epcs to https://github.com/RalphTro/epcis-event-hash-generator/blob/master/tests/examples/epclist_normalisation.jsonld (see branch 'issue112'):

                    "https://id.gs1.org/01/4064074123453/21/245",
                    "https://id.gs1.org/01/04064074123453/21/246",
                    "https://id.example.com/01/12345670/21/123",
                    "https://id.example.com/01/061414112345/21/123",
                    "https://id.example.com/01/4012345123456/21/123"

...into

I then executed the code, and it correctly transformed the above into:

epc=https://id.gs1.org/01/04064074123453/21/245
epc=https://id.gs1.org/01/04064074123453/21/246
epc=https://id.gs1.org/01/00000012345670/21/123
epc=https://id.gs1.org/01/00061414112345/21/123
epc=https://id.gs1.org/01/04012345123456/21/123

So, from my POV, it actually works, any idea why it doesn't work for you? E.g. do you think it makes sense to specifically test it with an XML file?

Echsecutor commented 8 months ago

I have added an xml example here https://github.com/RalphTro/epcis-event-hash-generator/pull/113/files which to me looks like it doesn't work, i.e. those 3 events lead to 2 different hashes

RalphTro commented 7 months ago

Thanks, @Echsecutor ,

I think I found the reason for this bahaviour in your XML file: In the first and second event, the inputQuantityList looks as follows in the pre-hash string: inputQuantityListquantityElementepcClass=https://id.gs1.org/01/04064074123453/10/245

But in the third one, the GTIN has another GTIN indicator digit (9 instead of 0):
inputQuantityListquantityElementepcClass=https://id.gs1.org/01/94064074123453/10/245

...and this MUST of course lead to a different hash value.

I just noticed that the GTIN as part of the first GS1 DL URI has an incorrect check digit (it must be 04064074123450), the second one is correct though. When canonicalising EPC URNs to GS1 DL URIs, our tool calculates the check digit.

And THIS is the reason why the hash value is still different even if you corrected the EPC URN. The pre-hash string has the correct check digit (0) after canonicalising the EPC URN, while event 1 + 2 still have the incorrect ones. So, again, it is correct that our implementation returns different hash values.

Now, the interesting question is: does our implementation needs to check each and every identifier populating the epc/quantityLists? This would go beyond of what is specified in the CBV. What is your view on this?

Hope this helps/clarified your question?

Kind regards, Ralph

Echsecutor commented 7 months ago

Ahh... so this eventually is a typo in my test data and actually the normalization is already in place. Big sorry for wasting your time on this one @RalphTro !

I do not think that validating correct inputs (such as check digits) is within the scope of this hash generater reference implementation. Though it would help with stupid user errors ... ;)

RalphTro commented 7 months ago

Dear @Echsecutor , No worries. Glad I could help you for a change. ;-) See you soon; Ralph