Closed GeSi-Software-GmbH closed 5 years ago
Thank you for summarizing the points that you raised during our Workshop in Berlin on May 15th. Some aspects:
1. Similar issue
This is related to the ESCom discussion at https://github.com/esdscom/escom-xml/issues/1 where the original idea was to use phrases for non-translatable categories to avoid schema changes. Their current status is to use phrases instead of enumerations, but have the categories (i.e. the equivalent to the IUCLID picklist entries) is the OriginalCode field of an ESCom phrase and their translatable equivalent in the phrase text. Interesting are several aspects:
So I think we have the option of using a string (perhaps with a pattern), or an enumeration.
2. Data quality, or level of acceptance of false data
You argue that it is better for the SDScom to allow for wrong data that to deliver cryptical error messages, even at the cost of recipients that need to deal with this wrong data. I do think that, as SDScom is striving to improve data quality in the supply chain, we should avoid creating false data in the first place. It is easy for IT systems to create e.g. dropdowns for pick lists and thus avoid wrong data. As the validation is mainly the proof of a correct interface between companies, it serves to detect on which side a data problem is caused. So I believe the interface should check as strictly as technically possible.
3. Specific case of IUCLID picklists
I think you have a point with volatile enumeraitions / pick lists such as the German Giscodes, with very flexible new releases and sometimes unclear reassignment of codes. For IUCLID picklists however, chances are they are very stable - EChA has not been updating these too often. Similarly, nobody objected against classifications, storage classes or WGK as enumerations - they rarely change and it is crucial to ensure that correct data arrives at the recipient. So if others agree that we should assess those cases based on the revision cycle of the enumeration content, we can identify "volatile" enumerations here and convert them to a string field.
These are the IUCLID enumerations that have been implemented in v4.4.0 for Poison Center Notification (issue #124):
ColourEnum = IUCLID PG6-60569 ColourIntensityEnum = IUCLID PG6-60568 ComponentFunctionEnum = IUCLID N28 and N28A CompositionTypeEnum: For IUCLID N08, enhanced with „Mixture: Polymer“ EuPCSEnum = IUCLID PG6-60567 FormEnum = IUCLID A101 FormulationTypeEnum = IUCLID B04 JustificationForUpdateEnum = IUCLID PG6-60571 PackagingMaterialEnum = IUCLID B07 PackagingTypeEnum = IUCLID B05 SubstanceTypeEnum = IUCLID N58
https://github.com/esdscom/sdscom-xml/wiki now mentions that enumerations should be used only for non-volatile lists. Giscodes are a simple string. Any other hints to specific enums that should be converted are welcome (as separate issues with reference to this one).
We would like to see more string values instead of enumerations in the schema. The reason for this is maintainability:
Yes, the importer of the document wants valid data. But we think it's up to her / him to deal with invalid or even just unknown enums. As importer you have the general option
This is all better than surrender because the schema does not validate and to scare off the user with a cryptic standard schema validation error (which is an other topic).
I will give some example enumerations later, but everybody is invited to do so.
Eric Kraußer, GeSi Software GmbH