RalphTro / epcis-event-hash-generator

ALGORITHM and SOFTWARE PROTOTYPE to uniquely identify/validate the integrity of any EPCIS event through a common, syntax-agnostic approach based on hashing. Takes an EPCIS Document (formatted in either XML or JSON-LD) and returns the corresponding hash value(s).
MIT License
8 stars 4 forks source link

SSCC canonicalisation #87

Closed RalphTro closed 1 year ago

RalphTro commented 1 year ago

Dear All, I just noticed that when executing the current master branch deployent against the first reference example (ReferenceEventHashAlgorithm), the SSCCs are canonicalised as follows (both in XML as well as JSON/JSON-LD):

epc=https://id.gs1.org/00/34012345001118
epc=https://id.gs1.org/00/34012345002221
epc=https://id.gs1.org/00/34012345003334

This is not how it should be (see also https://github.com/RalphTro/epcis-event-hash-generator/blob/master/tests/examples/ReferenceEventHashAlgorithm.prehashes):

epc=https://id.gs1.org/00/040123450000001112
epc=https://id.gs1.org/00/040123450000002225
epc=https://id.gs1.org/00/040123450000003338

Can someone of you replicate this issue on his machine or have an idea why this happens (e.g. why the SSCC now has 14 instead of 18 digits, removes some zeros and the '3' appears at the beginning of the strings)?

Kind regards, Ralph

Echsecutor commented 1 year ago
$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

$ ./epcis_event_hash_generator/__main__.py -p -j '\n' tests/examples/ReferenceEventHashAlgorithm.xml 

Hashes of the events contained in 'tests/examples/ReferenceEventHashAlgorithm.xml':
ni:///sha-256;3605b0f24a19125edeec1ade2e575bd6ce6d7b55f387ef71285621498ff1b4fb?ver=CBV2.0

Pre-hash strings:
eventType=ObjectEvent
eventTime=2020-03-04T10:00:30.000Z
eventTimeZoneOffset=+01:00
epcListepc=https://id.gs1.org/00/0401234500000001118
epc=https://id.gs1.org/00/0401234500000002221
epc=https://id.gs1.org/00/0401234500000003334
action=OBSERVE
bizStep=https://ref.gs1.org/cbv/BizStep-departing
readPointid=https://id.gs1.org/414/4012345000115/254/987
{https://ns.example.com/epcis/}myField1{https://ns.example.com/epcis/}mySubField1=2
{https://ns.example.com/epcis/}mySubField2=5
{https://ns.example.com/epcis/}myField2=0
{https://ns.example.com/epcis/}myField3{https://ns.example.com/epcis/}mySubField3=1
{https://ns.example.com/epcis/}mySubField3=3

works on my machine? ;)

RalphTro commented 1 year ago

Thanks, @Echsecutor !

Update: It actually also works fine on my Mac, but the error remains on my Windows machine. (Before someone asks - I double-checked that both machines use the current master branch and that the latter is up to date. :-))

I think all of you work on either Linux or Mac - anyone able to try it out in a Windows environment and/or have an idea why this happens?

RalphTro commented 1 year ago

And another update, @Echsecutor : The canonical SSCCs, when executing it on my Mac, have 19 digits (and should only have 18).

I.e., instead of https://id.gs1.org/00/0401234500000001118, it should be https://id.gs1.org/00/040123450000001112

RalphTro commented 1 year ago

Idea from @dakbhavesh : possibly caused by library (GTIN library) - Bhavesh kindly will have a look on this

dakbhavesh commented 1 year ago

Checked with the latest and raised an issue to the GTIN repository as there seems to be a bug or intentional change in the algorithm. Reference: https://github.com/enorganic/gtin/issues/9

dakbhavesh commented 1 year ago

Dear @RalphTro,

I am not really familiar with GTIN conversion business logic. Therefore, I am putting an internal state of GTIN object so that you can let me know if the initialization of various values is done right.

URN: urn:epc:id:sscc:040123450000000111
input to GTIN api: GTIN(raw=040123450000000111)

The internal state of GTIN Object:

checkDigit = 8
gcp = '4012345'
indicator_digit = ''
length = 19
raw = 040123450000000111

We are expecting output as https://id.gs1.org/00/040123450000001112 so Do you think the checkDigit is wrongly calculated?

Found below documentation in the GTIN package (function: main.py -> calculate_check_digit) for check-digit calculation:

A check digit is calculated from the preceding digits by
    multiplying the sum of every 2nd digit *from right to left* by 3,
    adding that to the sum of all the other digits (1st, 3rd, etc.),
    modulating the result by 10 (find the remainder after dividing by 10),
    and subtracting *that* result *from* 10.
RalphTro commented 1 year ago

Dear @dakbhavesh , As to 'Do you think the checkDigit is wrongly calculated?': yes. The check digit for this SSCC (i.e. without check digit, '04012345000000011') should be '5' (not '1'), see e.g. https://gs1-germany.github.io/checkDigitCalculator/ or https://www.gs1.org/services/check-digit-calculator

Another great tool is the EPC translator, see https://www.gs1.org/services/epc-encoderdecoder

So, taking the example of "your" SSCC EPC URI, it should look as follows: (1) urn:epc:id:sscc:4012345.0000000011 (2) Corresponding SSCC: 040123450000000115 (3) (Which later on can be embedded into the canonical GS1 DL URI, i.e. https://id.gs1.org/00/040123450000000115)

The first GitHub project (https://github.com/gs1-germany/checkDigitCalculator) where I contributed to is "unfortunately" not written in Python, but Javascript. Otherweise, it would have been an option to import the latter into our project. :-/

Does that answer your question?

dakbhavesh commented 1 year ago

Thanks @RalphTro for pointing towards the useful tools to check check-digits. There is one gap in the way we are providing input to GTIN package.

For URN urn:epc:id:sscc:040123450000000111 hash generator is passing 040123450000000111 as an input to GTIN package and it is producing 8 as a check digit. However, In your last comment, you verified 04012345000000011 as opposed to 040123450000000111.

Note : GTIN package is also producing 5 as a check digit if I pass 04012345000000011 as input.

I am now curious if we are passing the wrong input to GTIN package. If so, then I would like to understand why? Do we need to omit the last digit from 040123450000000111 for some reason?

RalphTro commented 1 year ago

Dear @dakbhavesh , This is just to let you know that I will try and replace the gtin package with own code. Stay tuned...

RalphTro commented 1 year ago

Update: when executing the code for ReferenceEventHashAlgorithm.jsonld in the replace gtin module branch on my Windows machine, we still get an SSCC which is not correctly canonicalised (it has 19 instead of 18 digits):

https://id.gs1.org/00/0401234500000001118 https://id.gs1.org/00/0401234500000002221

GLNs or GTINs work fine, e.g.: https://id.gs1.org/414/4012345000115/254/9 https://id.gs1.org/01/04012345111118/21/9876

Let's discuss this matter in our next call...

RalphTro commented 1 year ago

Adjusted as of latest PR