buildingSMART / IDS

Computer interpretable (XML) standard to define Information Delivery Specifications for BIM (mainly used for IFC)
https://www.buildingsmart.org/standards/bsi-standards/information-delivery-specification-ids/
Other
196 stars 60 forks source link

Multiple regex patterns in the same requirement #285

Open LoamsDw opened 3 months ago

LoamsDw commented 3 months ago

Hi, I'm new to using ids and am experimenting with a series of validations, i wanted to ask if it was possible through the ids to develop a poperty requirements that has multiple patterns in the regex and that the rule is satisfied if at least one of these patterns is satisfied, like OR condition. Here is an example:

                   <propertySet>
                        <simpleValue>Pset_exemple</simpleValue>
                    </propertySet>
                    <name>
                        <simpleValue>WBS Code</simpleValue>
                    </name>
                    <value>
                        <xs:restriction base="xs:string">
                            <xs:pattern value="^.{3}(CA)(.{4}(MT))(.{4}(DTM)).*" />
                            <xs:pattern value="^.{3}(CN)(.{4}(AC))(.{4}(CNT)).*" />
                            <xs:pattern value="^.{3}(CS)(.{4}(AS))(.{4}(ASS)).*" />
                            <xs:pattern value="^.{3}(CS)(.{4}(SR))(.{4}(ARG|BAE|BIN|CCU|FLE|FSC|GEO|GRA|MUS|PAV|RIL|SCO|TVE)).*" />
                            <xs:pattern value="^.{3}(CS)(.{4}(ST))(.{4}(BAE|BIN|CCU|FLE|FSC|GEO|MUS|PAV|SCO|TRN|TVE)).*" />
                            <xs:pattern value="^.{3}(CV)(.{4}(FP))(.{4}(PMD)).*" />
                            <xs:pattern value="^.{3}(CV)(.{4}(FS))(.{4}(PZF)).*" />
                            <xs:pattern value="^.{3}(CV)(.{4}(IA))(.{4}(BIN|VEL)).*" />
                            <xs:pattern value="^.{3}(CV)(.{4}(IC))(.{4}(BIN|CCP|CCU|GDI|MUS|PAV|PRE|SCA|TCC)).*" />
                            <xs:pattern value="^.{3}(CV)(.{4}(PI))(.{4}(BAG|ISA|PUL|SEL)).*" />
                            <xs:pattern value="^.{3}(CV)(.{4}(SP))(.{4}(BAG|ISA|MAN|MFP|RIE|SFL)).*" />
                        </xs:restriction>
                    </value>
                </property>

The specific pattern checks that the nth characters contain specific combinations. Doing the validation tests with the solibri tool I always see that the rule is not respected.

CBenghi commented 3 months ago

Indeed it seems to be possible, see https://www.w3.org/TR/2011/CR-xmlschema11-2-20110721/datatypes.html#rf-pattern I will add a test case to the repository to have this documented.

CBenghi commented 3 months ago

@LoamsDw,

please note that the pattern assumes a full match with the string, so you need to remove the initial ^ in your values.

andyward commented 3 months ago

Presumably as a workaround you can use the built in OR operator in a regex. e.g. using | to separate the patterns

<xs:pattern value=".{3}(CA)(.{4}(MT))(.{4}(DTM)).*|.{3}(CN)(.{4}(AC))(.{4}(CNT)).*| etc " />

But obviously it make it pretty unwieldy.

CBenghi commented 3 months ago

@andyward,

Apologies, I was not explicit enough earlier.

The proposed case is valid, with the following minor correction:

<propertySet>
    <simpleValue>Pset_exemple</simpleValue>
</propertySet>
<name>
    <simpleValue>WBS Code</simpleValue>
</name>
<value>
    <xs:restriction base="xs:string">
        <xs:pattern value=".{3}(CA)(.{4}(MT))(.{4}(DTM)).*" />
        <xs:pattern value=".{3}(CN)(.{4}(AC))(.{4}(CNT)).*" />
        <xs:pattern value=".{3}(CS)(.{4}(AS))(.{4}(ASS)).*" />
        <xs:pattern value=".{3}(CS)(.{4}(SR))(.{4}(ARG|BAE|BIN|CCU|FLE|FSC|GEO|GRA|MUS|PAV|RIL|SCO|TVE)).*" />
        <xs:pattern value=".{3}(CS)(.{4}(ST))(.{4}(BAE|BIN|CCU|FLE|FSC|GEO|MUS|PAV|SCO|TRN|TVE)).*" />
        <xs:pattern value=".{3}(CV)(.{4}(FP))(.{4}(PMD)).*" />
        <xs:pattern value=".{3}(CV)(.{4}(FS))(.{4}(PZF)).*" />
        <xs:pattern value=".{3}(CV)(.{4}(IA))(.{4}(BIN|VEL)).*" />
        <xs:pattern value=".{3}(CV)(.{4}(IC))(.{4}(BIN|CCP|CCU|GDI|MUS|PAV|PRE|SCA|TCC)).*" />
        <xs:pattern value=".{3}(CV)(.{4}(PI))(.{4}(BAG|ISA|PUL|SEL)).*" />
        <xs:pattern value=".{3}(CV)(.{4}(SP))(.{4}(BAG|ISA|MAN|MFP|RIE|SFL)).*" />
    </xs:restriction>
</value>

The XML documentation at https://www.w3.org/TR/2011/CR-xmlschema11-2-20110721/datatypes.html#rf-pattern states that:

An XML containing more than one element gives rise to a single ·regular expression· in the set; this ·regular expression· is an "or" of the ·regular expressions· that are the content of the elements.

Therefore the restriction is considered a pass, if any of the provided patterns matches the value.

I don't see the need for the workaround you are proposing. Am I missing something?

andyward commented 3 months ago

I don't see the need for the workaround you are proposing. Am I missing something?

I was just assuming this might need additional support/testing in the relevant implementations. e.g. I just checked XIDS and I don't think it's correctly de-serialising multiple patterns in this way , and may not be the only implementation to have overlooked this.

CBenghi commented 3 months ago

I don't see the need for the workaround you are proposing. Am I missing something?

I was just assuming this might need additional support/testing in the relevant implementations. e.g. I just checked XIDS and I don't think it's correctly de-serialising multiple patterns in this way , and may not be the only implementation to have overlooked this.

Are you suggesting that we add an implementer agreement to constrain this?

andyward commented 3 months ago

Are you suggesting that we add an implementer agreement to constrain this?

I'm not sure how best - but it feels like an edge case that may cause discrepancies if we don't formalise it. Can we cover off in the 'Complex restrictions' markdown somehow? And ideally have some test cases.

Is it just pattern that can be repeated like this in a restriction? The XSD docs don't appear indicate that any of the other Constraining Facets can be duplicated, so I guess pattern is an exception. E.g. An enumeration's 'or' operates at the next level down - so there's only ever one enumeration in a restriction.

LoamsDw commented 3 months ago

Presumably as a workaround you can use the built in OR operator in a regex. e.g. using | to separate the patterns

<xs:pattern value=".{3}(CA)(.{4}(MT))(.{4}(DTM)).*|.{3}(CN)(.{4}(AC))(.{4}(CNT)).*| etc " />

But obviously it make it pretty unwieldy.

I don't know if it can be useful but I did some tests and I get the result I want just by concatenating the patterns with the "|" separator even though it doesn't seem to be the most elegant solution it worked

berlotti commented 2 months ago

A combination of (multiple) patterns and other restrictions like enumerations, min/max restrictions, etc. is also possible.