clarin-eric / standards

work space for the Standards and Interoperability Committee
https://www.clarin.eu/content/standards
4 stars 15 forks source link

full processing of registryLink, including RIs #300

Open bansp opened 4 months ago

bansp commented 4 months ago

I have turned <a> into registryLink with some extra attributes, and while this is now handled for multiple links to the same registry, in a single RI (commit forthcoming), we need to extend this to cases where a centre is listed by more than one RI.

This is not so distant; for now, I was simply unable to locate Text+ or DARIAH centre/repository registries, or I would have added references to them where appropriate.

bansp commented 4 months ago

@margaretha this is one of the commits I mentioned as needing your eye for a check if everything is idiomatic, etc.

For example, I've noticed we have a conflict of naming styles and I tried to adjust to your practice (so I used $registry-links instead of $registryLinks or $registry_links; the last one being my favourite but I think no one uses that any longer, so I usually go for the Java style).

But I'm a bit afraid that I have let javaStyled naming slip in, at a few places, elsewhere... :-\

bansp commented 4 months ago

In order to be more precise about registryLink, two examples follow.

      <centre id="ZIM" deposition="1">
         <name>ZIM Centre for Information Modelling</name>
         <registryLink registry="CLARIN" uri="https://centres.clarin.eu/centre/65"/>
         <nodeInfo>
            <ri status="B-centre">CLARIN</ri>
         </nodeInfo>
      </centre>

First, a simple one, with a minimal modification of the <a>. The content of the attribute @registry and of the element <ri> is controlled by the same closed set of values. Now, a more complex case:

        <centre id="CLARIN-CH" deposition="1">
            <name>CLARIN Switzerland</name>
            <registryLink registry="CLARIN" uri="https://centres.clarin.eu/centre/80" label="CLARIN-CH-LiRI"/>
            <registryLink registry="CLARIN" uri="https://centres.clarin.eu/centre/81" label="CLARIN-CH-LaRS"/>
            <nodeInfo>
                <ri status="B-centre">CLARIN</ri>
            </nodeInfo>
        </centre>

The recommendations for CLARIN-CH put two centres onto a single list, because, from the perspective of the SIS, these centres share everything except their CLARIN registry link and the associated name.

Partial visualisation logic is added in the commit linked above: it produces a list if there is more than one registryLink, and if the optional attribute @label is present, it prints it out.

What remains to be done is possibly:

  1. sensitivity to the RI switch -- except
    1. I am not sure if it is needed, in the sense that it may be useful to know that a centre serves as a node in more than one RI.
    2. I have not been able to locate registries for Text+ or DARIAH. (It's hard to believe that they don't exist...) So, for now, CLARIN is the only registry that is used
  2. visualisation that takes into account the RI -- I imagine that as simply an outer loop over the existing functionality, printing the RI name in the first column (in the outer <div> and repeating the existing functionality in the inner <div>(s)
  3. enhancement of the existing calculation logic for the KPI, popular formats and general statistics (we have a single document for CLARIN-CH, but it should count as two centres)

Note that points (1) and (2) are, for the time being, moot. The only relevant point is (3), and that may be a matter of a single function that returns 1 by default but for centres like CLARIN-CH, it returns the number of registryLinks that have the same @registry. ("What if the structure gets complicated and we have a single file for two CLARIN centres that are at the same 3 DARIAH centres", you ask. I reply: that is simply not going to happen, because in such cases, we would simply split the representation into individual centres.)

bansp commented 4 months ago

A variant of (3):

  1. where a centre has more than one registryLink (or more than one registryLink/@label, maybe), use the labels as separate centre names for various lists (with all the rest of info duplicated)
bansp commented 1 month ago

Update: a registry for Text+ has been created, link follows shortly...

bansp commented 1 month ago

Some of the cases that would need to be enriched with an indirection mechanism, to count centres as two (or more) depending on the number of labelled registryLinks.

In model/recommendation-by-centre.xqm, we have:

declare variable $recommendation:centres := collection('/db/apps/clarin/data/recommendations')/recommendation;

In modules/centre.xql, there is this:

declare function cm:count-number-of-centres-with-recommendations($centres) {
    let $centre-with-recommendations :=
    for $c in $centres
    let $recommendations := cm:get-recommendations(data($c/@id))
    let $numOfRecommendations := count($recommendations/formats/format)
    return
        if ($numOfRecommendations > 0)
        then
            1
        else
            0

    return
        sum($centre-with-recommendations)
};