RalphTro / epcis-event-hash-generator

ALGORITHM and SOFTWARE PROTOTYPE to uniquely identify/validate the integrity of any EPCIS event through a common, syntax-agnostic approach based on hashing. Takes an EPCIS Document (formatted in either XML or JSON-LD) and returns the corresponding hash value(s).
MIT License
8 stars 4 forks source link

Confirmation on the order for User Extensions in the pre-hash string while creating the Hash-ID #71

Closed Aravinda93 closed 1 year ago

Aravinda93 commented 2 years ago

Dear @RalphTro

Hope you are doing great and having nice summer :)

We have a quick question regarding the order of elements for User-Extensions while creating the pre-hash string.

Just for example purpose, we have created the following XML EPCIS document where we have tried to add different types of extensions that the event can contain:

<epcis:EPCISDocument
    xmlns:epcis="urn:epcglobal:epcis:xsd:2"
    xmlns:cbvmda="urn:epcglobal:cbv:mda" schemaVersion="2.0" creationDate="2013-06-04T14:59:02.099+02:00"
    xmlns:example="http://example.com/">
    <EPCISBody>
        <EventList>
            <ObjectEvent>
                <eventTime>2005-04-04T20:33:31.116-06:00</eventTime>
                <eventTimeZoneOffset>-06:00</eventTimeZoneOffset>
                <errorDeclaration>
                    <example:errorExtension>Error Extension</example:errorExtension>
                    <declarationTime>2020-01-15T00:00:00+01:00</declarationTime>
                    <reason>urn:epcglobal:cbv:er:incorrect_data</reason>
                </errorDeclaration>
                <epcList>
                    <epc>urn:epc:id:sgtin:0614141.107346.2018</epc>
                </epcList>
                <action>OBSERVE</action>
                <bizStep>urn:epcglobal:cbv:bizstep:receiving</bizStep>
                <disposition>urn:epcglobal:cbv:disp:in_progress</disposition>
                <readPoint>
                    <example:readPointExtension>ReadPoint Extension</example:readPointExtension>
                    <id>urn:epc:id:sgln:0012345.11111.400</id>
                </readPoint>
                <ilmd>
                    <example:ilmdExtension>Ilmd Extension</example:ilmdExtension>
                </ilmd>
                <example:userExtensions>User Extensions-1</example:userExtensions>
                <example:userExtensions2>User Extensions-2</example:userExtensions2>
            </ObjectEvent>
        </EventList>
    </EPCISBody>
</epcis:EPCISDocument>

Following is the pre-hash string:

type=ObjectEvent

eventTime=2005-04-05T02:33:31.116Z

eventTimeZoneOffset=-06:00

errorDeclaration
declarationTime=2020-01-14T23:00:00.000Z

reason=https://ns.gs1.org/voc/ER-incorrect_data

epcList
epc=https://id.gs1.org/01/10614141073464/21/2018

action=OBSERVE

bizStep=https://ns.gs1.org/voc/Bizstep-receiving

disposition=https://ns.gs1.org/voc/Disp-in_progress

readPoint
id=https://id.gs1.org/414/0012345111112/254/400

ilmd
{http://example.com/}ilmdExtension=Ilmd Extension

errorDeclaration
{http://example.com/}errorExtension=Error Extension

readPoint
{http://example.com/}readPointExtension=ReadPoint Extension

{http://example.com/}userExtensions=User Extensions-1
{http://example.com/}userExtensions2=User Extensions-2

We just wanted to confirm the ordering of the User Extensions elements and the way the pre-hash string is created. Is this the correct pre-hash string for the above event? Where we try to differentiate the User-Extensions based on their parent tag and within that we try to order them.

Because when we tried the tool we were getting the ILMD elements after the ErrorDeclaration extensions and then the readPoint extensions so we just got confused a bit and thought of confirming once.

Please note:

  1. Above event is just for example purpose to get the order logic correct.
  2. Line breaks have been added in the pre-hash string just for easy reading purposes only.
  3. As you have mentioned earlier, the tool is a bit out of date so we just wanted to confirm the order.

Wish you a great weekend ahead :)

Thanks and Best regards, Aravinda

Echsecutor commented 2 years ago

Hi @Aravinda93 !

The (lexicographical) ordering for user extensions is indeed done level by level, i.e. we do first sort the highest level and then the sub levels respectively.

@RalphTro I think we have not properly specified whether user extensions within canonical elements should appear

RalphTro commented 2 years ago

Dear @Aravinda93 , Thanks for bringing this up. @Echsecutor is correct (i.e. that sorting must always happen level-wise).

As to your remark, Sebastian: we specified the following (see step 20): "If an EPCIS event comprises user extension elements as part of an EPCIS standard field with an extension point (...) , they SHALL be added to the pre-hash string similarly as specified in the previous step". And in the previous step (19), we stated that they have to be "...appended to the pre-hash string". So, indeed at the end of the pre-has string, but without repeating the name of the parent.

RalphTro commented 2 years ago

@Aravinda93 : just a short (though unrelated) comment: I guess that you are aware of it, but just to be sure: the domain name for the Web URIs identifying the CBV standard values was slightly changed in the last weeks before eBallot: e.g. it is not bizStep=https://ns.gs1.org/voc/Bizstep-receiving, but bizStep=https://ref.gs1.org/voc/Bizstep-receiving See also https://github.com/RalphTro/epcis-event-hash-generator/issues/69 EBallot is now closed, so this means that we have a stable standard which we can build upon.

Aravinda93 commented 2 years ago

@Echsecutor @RalphTro Thanks a lot for the update on the order of extensions.

Also, thanks for bringing the changes related to https://ns.gs1.org/ into notice. Maybe I missed that part and will accordingly change them in our applications.

RalphTro commented 1 year ago

Update @Aravinda93 's sample as provided above results in the following pre-hash string when executing it in the current master branch:

eventType=ObjectEvent
eventTime=2005-04-05T02:33:31.116Z
eventTimeZoneOffset=-06:00
errorDeclarationdeclarationTime=2020-01-14T23:00:00.000Z
reason=https://ref.gs1.org/cbv/ER-incorrect_data
epcListepc=https://id.gs1.org/01/10614141073464/21/2018
action=OBSERVE
bizStep=https://ref.gs1.org/cbv/BizStep-receiving
disposition=https://ref.gs1.org/cbv/Disp-in_progress
readPointid=https://id.gs1.org/414/0012345111112/254/400
errorDeclaration{http://example.com/}errorExtension=Error Extension
ilmd{http://example.com/}ilmdExtension=Ilmd Extension
readPoint{http://example.com/}readPointExtension=ReadPoint Extension
{http://example.com/}userExtensions2=User Extensions-2
{http://example.com/}userExtensions=User Extensions-1

“userExtensions” should appear prior to “userExtensions2”. As the current implementation (master branch) appends the latter first, I think there is need for a slight adjustment.

ShaikDayan commented 1 year ago

Dear @RalphTro ,

We tried to generate the pre-hash for the above example in python, java application and java script in all the three application we are getting same output for UserExtensions.

Python implementation pre-hash for "userExtensioins"

eventType=ObjectEvent
eventTime=2005-04-05T02:33:31.116Z
eventTimeZoneOffset=-06:00
errorDeclarationdeclarationTime=2020-01-14T23:00:00.000Z
reason=https://ref.gs1.org/cbv/ER-incorrect_data
epcListepc=https://id.gs1.org/01/10614141073464/21/2018
action=OBSERVE
bizStep=https://ref.gs1.org/cbv/BizStep-receiving
disposition=https://ref.gs1.org/cbv/Disp-in_progress
readPointid=https://id.gs1.org/414/0012345111112/254/400
errorDeclaration{http://example.com/}errorExtension=Error Extension
ilmd{http://example.com/}ilmdExtension=Ilmd Extension
readPoint{http://example.com/}readPointExtension=ReadPoint Extension
{http://example.com/}userExtensions2=User Extensions-2
{http://example.com/}userExtensions=User Extensions-1

Java application pre-hash for "UserExtensions"

eventType=ObjectEvent
eventTime=2005-04-05T02:33:31.116Z
eventTimeZoneOffset=-06:00
epcListepc=https://id.gs1.org/01/10614141073464/21/2018
action=OBSERVE
bizStep=https://ref.gs1.org/cbv/BizStep-receiving
disposition=https://ref.gs1.org/cbv/Disp-in_progress
readPointid=https://id.gs1.org/414/0012345111112/254/400
ilmd{http://example.com/}ilmdExtension=Ilmd Extension
readPoint{http://example.com/}readPointExtension=ReadPoint Extension
{http://example.com/}userExtensions2=User Extensions-2
{http://example.com/}userExtensions=User Extensions-1

Another Example: Updating "userExtensions" to "userExtensions1"

"example:userExtensions1": "User Extensions-1",
"example:userExtensions2": "User Extensions-2",

Output for above json data using python and java application

{http://example.com/}userExtensions1=User Extensions-1
{http://example.com/}userExtensions2=User Extensions-2

Here the "UserExtensions1" appear prior to "UserExtensions2"

The sort is taking the entire value, which is why "userExtensions2" appears before "userExtensions." We would like to know your opinion as to whether the sort should take place at a key level and at value level or whether it is acceptable to sort the entire value.

RalphTro commented 1 year ago

Dear @ShaikDayan , Many thanks for looking into this! Though I thought that this behaviour is not looking correct, I double-checked our description and I think the current implementations actually do not have to be changed. Rationale:

In the algorithm description, we state that any given user extension, consisting of their:

key names (full namespace embraced by curly brackets ('{' and '}') and the respective local name), as well as, if present, the contained value (...)

...is to be handled as follows:

The resulting substrings SHALL be sorted according to their case-sensitive lexical ordering, considering UTF-8/ASCII code values of each successive character when they are appended to the pre-hash string.

Therefore, the lexicographical ordering is based on the entire substring, thus the behaviour is correct in all three implementations. My apologies for my comment last week!

So, from your POV (also @Aravinda93's as the initiator of this issue) - can we close this one?

Kind regards and have a great start into the new week; Ralph

Aravinda93 commented 1 year ago

Dear @RalphTro & @ShaikDayan

Thank you for your time to address this issue and for providing confirmation.

Upon initial inspection, it seemed that the Python implementation was not sorting correctly, which prompted me to raise this issue due to misalignment with the Java implementation. Previously, our Java implementation was not taking into account the complete namespaces & values, which resulted in differences in its functionality in comparison to Python implementation. We have recently made several modifications in accordance with the standards, ensuring that it now operates as expected.

However, after being confirmed by @ShaikDayan and @RalphTro, it has been determined that the ordering is indeed correct and in line with the standard. Please accept my apologies for any confusion caused. I kindly request that this issue be closed.

Thanks and Best regards, Aravinda

RalphTro commented 1 year ago

Dear @Aravinda93 , Absolutely no need for you to apologise (if at all, it should be me! ;-)) - quite the contrary! This discussion helped a lot from my POV. Many thanks for raising this issue. Kind regards, Ralph