[repository schema] Field type attribute does not distinguish between datatype and code set reference #170

FIXTradingCommunity / fix-orchestra

Machine readable rules of engagement

Apache License 2.0

71 stars 34 forks source link

[repository schema] Field type attribute does not distinguish between datatype and code set reference #170 #171

Closed donmendelson closed 1 year ago

donmendelson commented 1 year ago

Formerly, the type attribute of a field referenced either a datatype or a code set. (Semantically, a code set is a specialized datatype with a finite set of valid values.) This proposed schema change introduces a new field attribute codeSet while type attribute only references a datatype. The type and codeSet attributes are enforced by the schema to be mutually exclusive.

Examples of valid field definitions, one with a code set and the other with a dataype:

<fixr:field codeSet="LocateReqdCodeSet" id="114" name="LocateReqd" abbrName="LocReqd" added="FIX.4.0"/>
<fixr:field type="Price" id="132" name="BidPx" abbrName="BidPx" added="FIX.4.0"/>

Invalid field definitions because they use the wrong attribute or an attribute is missing:

<fixr:field id="137" name="MiscFeeAmt" abbrName="Amt" added="FIX.4.0"/>
<fixr:field codeSet="Currency" id="138" name="MiscFeeCurr" abbrName="Curr" added="FIX.4.0"/>
<fixr:field type="MiscFeeTypeCodeSet" id="139" name="MiscFeeType" abbrName="Typ" added="FIX.4.0"/>

The example file does not pass validation due to schema changes. In fact, all Orchestra v1.0 compliant files will need to be converted; suggest adding Orchestra v1.0 to v1.1 conversion in orchestra-transposer utility and making it public. Also, existing code will need to change.

erakadjiev commented 1 year ago

Hello @donmendelson,

Thank you! This addresses well the improvement request to make it explicit if the field type is a data type or a code set.

In addition to this, it would be good to make also the scenario of the field type explicit. Based on our Orchestra tooling development experience, the implicit code set type scenario has been a source of confusion as well. Data types don't have scenarios yet, but as I understand, as part of 1.1, they might support scenarios too.

Specifying the scenario explicitly would make the field type reference align better with other references (which use the refidGrp attribute group). On a related note, in refidGrp references a missing scenario name means base scenario, while for the field type reference it currently means "same scenario as that of the field" - it would be good to remove that inconsistency as well.

This could be achieved by having additional typeScenario or codeSetScenario attributes.

If we wanted to be more consistent with the refidGrp way of referencing, we could use type IDs instead of names, i.e. have the same id and scenario attributes as in refidGrp. In that case, we would need a selector attribute to specify if the type ID is of a data type or a code set, since IDs are not unique across different element types (yet?).

What do you think?

donmendelson commented 1 year ago

@erakadjiev, I'm glad that this proposal addresses your issue with type references.

All scenario references are deterministic in the schema; it's just that "base" scenario is the default so it need not be stated. In FIX Latest, we suppress the output of default values, and it dramatically reduces the size of an XML file. But you can output it if you prefer. The communication of scenarios should be better in v1.1 where they are centrally declared under a <scenarios> element. See PR #157.

You are correct that <datatype> is one of the few remaining elements without a numeric ID. Elsewhere there was renewed discussion of whether Orchestra elements should have a globally unique identifier rather than a simple integer that is only unique within one Orchestra file.

erakadjiev commented 1 year ago

Hello @donmendelson, thank you for the reply!

Yes, in the other references (fieldRef, componentRef, etc.) the scenario specification works well. Centrally declaring the scenarios in 1.1 will improve this further.

However, for the field type reference (data type or code set) it works differently. When a field references a code set as its type, no scenario can be specified; it implicitly uses the same scenario for the code set as that of the field. So this is not consistent with all the other references and has led to numerous misunderstandings during tooling development.

What I suggested in my previous comment is that also for the field type reference (data type or code set) the scenario should be explicitly specified, same as in other references. If it's the base scenario, then it can be omitted, as for other references.

Something along the lines of those 2 options:

<fixr:field codeSet="LocateReqdCodeSet" codeSetScenario="Trade" id="114" name="LocateReqd" abbrName="LocReqd" added="FIX.4.0"/>
or
<fixr:field typeId="123" typeScenario="Trade" typeClassifier="codeSet" id="114" name="LocateReqd" abbrName="LocReqd" added="FIX.4.0"/>

As for the IDs, having a numeric ID for data types too would be good. And I think that using globally (at least within a single Orchestra file) unique IDs for all elements would be better. If I'm not mistaken, the current numeric IDs are not even unique within a single Orchestra file, just within an element type. For example, a field can have the same numeric ID as a component.

donmendelson commented 1 year ago

@erakadjiev I see what you mean about the possibility of a field of scenario X referencing datatype or codeset of scenario Y. In the current design, we didn't think that was necessary. In fact the main reason to have a field scenario was to point to a scenario of a codeset. They seemed to be linked. But if there is consensus in favor of your request, we can introduce a type scenario attribute on a field.

erakadjiev commented 1 year ago

Thank you, @donmendelson ! Perhaps there could be some business use-cases when a field with scenario X references a code set/data type with scenario Y? But from an engineering point of view, specifying the scenario of the code set/data type would make things cleaner and also make it consistent with other reference types.

kleihan commented 1 year ago

@donmendelson @erakadjiev I would like to add another angle to the discussion. I am still struggling with the need to repeat a field multiple times in the <fixr:fields> element just because it is used with more than one scenario of a code set. I know that there are more reasons to define fields multiple times (e.g. range definitions). The <fixr:fieldref> element currently uses the scenario attribute (and has to do that) to refer to the code set to use for this usage of the field.

What if the reference to a scenario is ONLY in the places where a field is being used? We are now introducing scenarios for datatypes. This would allow to also distinguish datatype differences where the field is used. @erakadjiev you already suggested a codeSetScenario and typeScenario attribute, for example:

<fixr:fieldRef id="626" codeSetScenario="Trade" typeScenario="ShortInteger"/>

I believe that field definitions and field references both include fieldAttribGrp. Some of these attribute like presence do not make sense in a field definition, only in a field reference. The question I am basically asking is what the downside is to making entries in <fixr:fields> unique? If one needs to use the field in different ways then this would be defined in the respective places.

Maybe I am not seeing the forest for the trees here but wanted to know if this is something worth exploring for v1.1.

erakadjiev commented 1 year ago

Hello @kleihan

This is an interesting idea. So basically fields would not have scenarios anymore, the scenario (for the data type or code set) would be specified only in the fieldRef, right?

Would field definitions still point to the ID or name of a data type and code set? The issue I see with this, from a modeling point of view, is that we would split data type and code set references into 2, each defined in a different place - which is not nice. I.e. data type or code set name/ID defined in the field, scenario defined in the fieldRef.

An alternative would be if the field definitions always pointed to a data type (incl. the data type's scenario), and fieldRefs optionally pointed to a code set (incl. the code set's scenario). There should be validation to make sure that the specified code set's underlying data type is the same as the field definition's data type. For example:

<fixr:field id="5" name="AdvTransType" type="String" typeScenario="SpecialString" abbrName="AdvTransTyp" added="FIX.2.7"/>
<fixr:fieldRef id="5" codeSet="AdvTransTypeCodeSet" codeSetScenario="Trade" presence="required" added="FIX.2.7">

This would make creation of fieldRefs a little more cumbersome, because the user will need to reference the exact code set for every fieldRef.

On a related note, if data types will have scenarios, code set definitions would need to specify the underlying type's scenario too, right? For example:

<fixr:codeSet name="AdvTransTypeCodeSet" id="5" type="String" typeScenario="SpecialString" added="FIX.2.7"/>

donmendelson commented 1 year ago

@erakadjiev I agree with your last point:

On a related note, if data types will have scenarios, code set definitions would need to specify the underlying type's scenario too, right?

The underlying type of a code set could be set to a short int type, for instance.

kleihan commented 1 year ago

@erakadjiev, you asked about field attributes:

Would field definitions still point to the ID or name of a data type and code set? The issue I see with this, from a modeling point of view, is that we would split data type and code set references into 2, each defined in a different place - which is not nice. I.e. data type or code set name/ID defined in the field, scenario defined in the fieldRef.

To begin with, a field definition would refer to either a datatype or a code set, not both. That means that field references can only refer to a scenario of a datatype or a code set. Your example would not be possible where you define a datatype String (and scenario SpecialString) for the field and then a code set AdvTransTypeCodeSet (and scenario Trade) for the field reference. Using a scenario (datatype or code set) other than base already in the field definition would preclude using a scenario in any of the field references. However, it should be possible to do that.

kleihan commented 1 year ago

Here are examples of what you could do:

<fixr:field id="5" name="AdvTransType" type="String" abbrName="AdvTransTyp" added="FIX.2.7"/>
<fixr:fieldRef id="5" typeScenario="SpecialString" presence="required" added="FIX.2.7">

<fixr:field id="5" name="AdvTransType" codeSet="AdvTransTypeCodeSet" abbrName="AdvTransTyp" added="FIX.2.7"/>
<fixr:fieldRef id="5" codeSetScenario="Trade" presence="required" added="FIX.2.7">

Code set definition could then use a datatype scenario together with a code set scenario:

<fixr:codeSet name="AdvTransTypeCodeSet" id="5" type="String" typeScenario="SpecialString" scenario="Trade" added="FIX.2.7"/>

erakadjiev commented 1 year ago

Thank you for the replies, @donmendelson @kleihan

So for the code set definition it has been clarified that it would contain the data type scenario.

Here are examples of what you could do:
Using a scenario (datatype or code set) other than base already in the field definition would preclude using a scenario in any of the field references. However, it should be possible to do that.

This is what I meant that in my opinion it would not be nice from a modeling point of view, because now we have a reference to a data type or code set, which is split up in 2 places (name/ID in field, scenario in fieldRef).

Or do you mean that the field definition implicitly points to the base scenario and the fieldRef can override this to another (non-base) scenario? And if the definition points to a non-base scenario, then the reference cannot override it anymore? By the way, this would also imply that there always needs to be a base scenario if the user wants to use overriding in the fieldRef.

To begin with, a field definition would refer to either a datatype or a code set, not both. That means that field references can only refer to a scenario of a datatype or a code set.

Yes, of course a field definition would not point to both a data type and code set at the same time. I should have written 'or' there. Sorry for the confusion!

Your example would not be possible where you define a datatype String (and scenario SpecialString) for the field and then a code set AdvTransTypeCodeSet (and scenario Trade) for the field reference.

A code set is basically a specialization of a data type. What I suggested is that field references only a data type (i.e. any value of that data type) and fieldRef specializes that to a code set (i.e. a subset of values of that data type, as defined in the code set).

Having said that, compared to all the ideas about this so far, the current 1.0 way still seems to be cleaner. But perhaps we can find a better way.

kleihan commented 1 year ago

@erakadjiev I will try to clarify what I meant. A FIX field such as AdvTransType has a code set to begin with. If it is not mentioned in the field definition then it means scenario="base". Hence, a scenario is always defined (already in v1.0) in field and may be defined in both fieldand fieldref. We could allow an override in fieldref even if the scenario in field is not "base". That is probably less confusing than to say "override only allowed for non-base scenario". This approach should be agnostic to using a datatype or a code set.

Name/ID are present in both field and fieldref (as of v1.1). Scenario is optional and implies "base" unless it is missing in fieldref and has a non-base scenario in field. Then the non-base scenario is inherited by fieldref.

erakadjiev commented 1 year ago

Hello @kleihan

Thank you for the clarification!

So just to summarize everything we discussed above with an example:

<!-- base scenario -->
<fixr:datatype name="String"/>
<!-- in 1.1 data types can have scenarios too -->
<fixr:datatype name="String" scenario="SpecialString"/> 
<fixr:datatype name="String" scenario="OrdinaryString"/> 

<!-- base scenario -->
<!-- in 1.1 code sets specify the scenario of the data type they point to using the typeScenario attribute -->
<fixr:codeSet name="AdvTransTypeCodeSet" id="5" type="String" typeScenario="SpecialString"/> 
<fixr:codeSet name="AdvTransTypeCodeSet" id="5" scenario="Trade" type="String" typeScenario="SpecialString"/>
<fixr:codeSet name="AdvTransTypeCodeSet" id="5" scenario="PostTrade" type="String" typeScenario="SpecialString"/>

<!-- base scenario of the field pointing to base scenario of the code set -->
<fixr:field id="5" name="AdvTransType" codeSet="AdvTransTypeCodeSet"/> 
<!-- in 1.1 the field can point to a non-base scenario of the code set using the codeSetScenario attribute (field's scenario and codeSetScenario don't need to be the same) -->
<fixr:field id="5" name="AdvTransType" scenario="Trade" codeSet="AdvTransTypeCodeSet" codeSetScenario="Trade"/> 

<!-- in 1.1 a field reference can override the codeSetScenario of the field definition (and in 1.1 the fieldRef contains both the ID and the name) -->
<fixr:fieldRef id="5" name="AdvTransType" codeSetScenario="PostTrade" presence="required"/>
<!-- in 1.1 if the field reference doesn't specify a codeSetScenario, then it's inherited from the field definition (in this example, it's codeSetScenario="Trade") -->
<fixr:fieldRef id="5" name="AdvTransType" scenario="Trade" presence="required"/>

<!-- it works similarly for fields that use data type (rather than code set) -->
<!-- base scenario of the field pointing to base scenario of the data type -->
<fixr:field id="11" name="ClOrdID" type="String"/>
<!-- in 1.1 the field can point to a non-base scenario of the data type using the typeScenario attribute (field's scenario and typeScenario don't need to be the same) -->
<fixr:field id="11" name="ClOrdID" scenario="Trade" type="String" typeScenario="SpecialString"/>

<!-- in 1.1 a field reference can override the typeScenario of the field definition (and in 1.1 the fieldRef contains both the ID and the name) -->
<fixr:fieldRef id="11" name="ClOrdID" typeScenario="OrdinaryString" presence="required"/>
<!-- in 1.1 if the field reference doesn't specify a typeScenario, then it's inherited from the field definition (in this example, it's typeScenario="SpecialString") -->
<fixr:fieldRef id="11" name="ClOrdID" scenario="Trade" presence="required"/>

Let me know please if you see anything that's not in line with what we discussed.

Also, even with those model improvements, it'd still be possible to use the 1.0 way of adding a new field definition with a new scenario pointing to a different code set scenario, instead of using scenario overrides in fieldRefs. If anyone would prefer that.

kleihan commented 1 year ago

@erakadjiev that looks good with one significant exception. My main goal was to avoid the need to define a field more than once. That requires two things:

a field and a field reference cannot have a scenario attribute
if a field has a typeScenario or a codeSetScenario attribute then all field usages must follow that unless overriden at the field reference level.

That means that the following of your examples would be invalid:

<fixr:field id="5" name="AdvTransType" scenario="Trade" codeSet="AdvTransTypeCodeSet" codeSetScenario="Trade"/> 
<fixr:field id="11" name="ClOrdID" scenario="Trade" type="String" typeScenario="SpecialString"/>
<fixr:fieldRef id="5" name="AdvTransType" scenario="Trade" presence="required"/>
<fixr:field id="11" name="ClOrdID" scenario="Trade" type="String" typeScenario="SpecialString"/>
<fixr:fieldRef id="11" name="ClOrdID" scenario="Trade" presence="required"/>

In all cases scenario="Trade" should be removed. Why did you keep it?

An example for an override would be as follows. All usages of SecurityType(167) follow the scenario "Trade" with the exception of a few that require even less codes and follow the scenario "SpecialTrade". In this use case the user reduces the base scenario of FIX Latest to his own "base" before further reducing it where applicable.

<fixr:field id="167" name="SecurityType" codeSet="SecurityTypeCodeSet" codeSetScenario="Trade"/> 
<fixr:fieldRef id="167" name="SecurityType" codeSetScenario="SpecialTrade" presence="required"/>

Do you think that would work and be beneficial?

erakadjiev commented 1 year ago

Hello @kleihan

My thought was that we can leave scenarios for fields and fieldRefs, as we have them now. As you wrote, in some cases, it could be helpful to be able to define multiple scenarios of a single field.

In your case, if you don't need to define multiple scenarios, then you can just use the implicit base scenario for fields and fieldRefs and typeScenario/codeSetScenario overrides in fieldRefs. However, others might want to define multiple scenarios for a field. fieldRef with scenario X would use the type/codeSet defined in the corresponding field definition with scenario X. Accordingly, typeScenario/codeSetScenario overrides would apply to that field definition scenario.

That's the behavior I captured in the example in my previous comment. Btw. the example XML I shared above is meant to be a single XML (not independent lines) to show how everything links together and what possibilities there are. Also, that's behavior that I meant is "backward compatible":

Also, even with those model improvements, it'd still be possible to use the 1.0 way of adding a new field definition with a new scenario pointing to a different code set scenario, instead of using scenario overrides in fieldRefs. If anyone would prefer that.

Do you see a reason why scenarios for fields and fieldRefs should be disallowed?

kleihan commented 1 year ago

My thought was that we can leave scenarios for fields and fieldRefs, as we have them now. As you wrote, in some cases, it could be helpful to be able to define multiple scenarios of a single field.

@erakadjiev, If we leave scenarios for fields and fieldRefs, then we could still have a field defined multiple times, something I wanted to avoid. I was talking about having a single definition and multiple usages of a field. We have a different view here and we need a wider feedback to proceed (in a new issue). For now, I think it is better if we focus on Don's original proposal in this issue, i.e. mutually exclusive field attributes type and codeSet. I support Don's proposal.

donmendelson commented 1 year ago

I added some background here to help the working group with this discussion.

erakadjiev commented 1 year ago

@kleihan @donmendelson Sorry for the delay! Hanno, indeed, better to move the discussion to a wider audience (and away from this PR). I also support Don't proposal in this PR. Don, thank you for creating the page with the background info! I've added a comment to that page with a summary of the 4 changes for v1.1 that we discussed here.