clarin-eric / standards

work space for the Standards and Interoperability Committee
https://www.clarin.eu/content/standards
3 stars 13 forks source link

two kinds of centre "ID" -- some disambiguation is in order #280

Open bansp opened 2 months ago

bansp commented 2 months ago

While doing an overview of centres for #279 , I've come across CLARIN:EL, which wants to use a colon in the shorthand version of the name.

The result is now:

        <filter>
            <centreID>CLARIN:EL</centreID>
        </filter>
        <centre id="CLARIN-EL">

... which, I bet, is going to generate issues.

I cannot just go ahead and change the @id, because it is defined as

            <xs:attribute name="id" type="xs:ID" use="required"/>

... and I fail to recall the history of this part of the schema.

Let us please sort this out when we meet next. (I could do a search for where each of those "id"s is used, but that would be yet another level of task branching while I'm preparing a release, so I'd rather not go there right now.)

I can imagine changing the filter/centreID to, e.g., centreShort. Or removing the restriction on @id. Or something else. But we need to know the consequences first. One way or another, it shouldn't stay exactly as it is right now, I feel.

bansp commented 2 months ago

Gosh, I think the story here goes as follows (please, @margaretha , tell me if I got the facts wrong).

At first, we had a separate list of centres, all of them listed together, and in order to make sure to keep their short names unique, I used the ID type in the schema (they did function as IDs). So it was centre/@id

Separately, we had the lists of recommendations, and it so happened that there was one list per centre. Those lists used filter/centre, and then filter/centreID (a string).

Quite recently, we have scrapped the centres.xml and distributed its content among the individual recommendations. And that was when the two IDs were brought together,

Ideally, one of them should be eliminated, but I think I am going to take a middle path, at least for now:

And from there, we will be able to speculate which of them it is better to eliminate.

bansp commented 2 months ago

@margaretha This is again something I'd like to let you know about. I have unified our handling of centre IDs in the recommendations but stumbled on the Greek ID, which uses a colon in the name (CLARIN:EL) -- the corresponding recommendations file name must be CLARIN-EL-recommendations.xml. I am handling that on the way in in commit c3652f3 , but there is also the issue of the way out, i.e. export. I wasn't able to see where the export conversion magic happens :-) It felt to me like it's handled at a level closer to the OS rather than at the level of XQuery -- but maybe I have simply overlooked the relevant fragment of the code. One way or another, the exported file name is CLARIN_EL-recommendations.xml (note the underscore). Which is not a disaster, but I prefer to let you know that this aspect is still imperfect.

margaretha commented 2 weeks ago

Eliza should check if there is a replacement of semicolon etc to underscore, probably in export. And change to hypen so synchronize to recommendation file names.

margaretha commented 1 week ago

It seems that the normalization is done automatically within exist function response:set-header at https://github.com/clarin-eric/standards/blob/ab16e65e9a3987c88349d0d4778fd508512b6343/SIS/clarin/modules/export.xql#L82

I added manual-replacement of : with - to synchronize the filename for CLARIN-EL-recommendations.

Besides, I found that CLARINO_Bergen uses underscore instead of dash. This is probably inteded by the centre.

bansp commented 1 week ago

You mean, they don't use a whitespace but literally an underscore? Let's verify that in the clarin db, at some point? And thanks for your work on that! :-)

margaretha commented 1 week ago

I mean for the centre ID and the recommendation filename. The centre name uses whitespace.