cessda / cessda.cvs.two

Apache License 2.0
0 stars 2 forks source link

CVs show wrong available from link and do not have Canonical URIs #484

Closed cessda-bitbucket-importer closed 1 year ago

cessda-bitbucket-importer commented 1 year ago

Original report on BitBucket by Maja Dolinar.


If you go to the CV “CDC Publisher Names” at https://vocabularies.cessda.eu/vocabulary/CdcPublisherNames?lang=en and open the tab ‘License and Citation” there is a mistake in 'Available from’ section. See the picture below:

I tested this for other CVs (DDI, CESSDA) and the issue is present everywhere.

In tab ‘Identity and general’ Canonical URIs are missing everywhere.

The issue was reported from EOSC Helpdesk: https://eosc-helpdesk.eosc-portal.eu/#ticket/zoom/2197

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


The following mysql code generates uri, uri_sl and canonical_uri for all the published versions that have either value null. For each such version it takes uri, uri_sl and canonical uri of its previous version and replaces the version number in them in order to generate new values. The code should be executed at MySQL server:

SET @SQL_SAFE_UPDATES=@@SQL_SAFE_UPDATES;
SET SQL_SAFE_UPDATES = 0;
update version as dst
    left join version as src on
        dst.previous_version = src.id
set
    dst.canonical_uri = case 
        when dst.canonical_uri is null then regexp_replace(src.canonical_uri, regexp_replace(src.number, '([0-9]+)\.([0-9]+)(\.[0-9]+)?', '$1\\\\.$2(\\\\.[0-9]+)?'), dst.number)
        else dst.canonical_uri
    end,
    dst.uri = case
        when dst.uri is null then regexp_replace(src.uri, regexp_replace(src.number, '([0-9]+)\.([0-9]+)(\.[0-9]+)?', '$1\\\\.$2(\\\\.[0-9]+)?'), dst.number)
        else dst.uri
    end,
    dst.uri_sl = case
        when dst.uri_sl is null then regexp_replace(src.uri_sl, regexp_replace(src.number, '([0-9]+)\.([0-9]+)(\.[0-9]+)?', '$1\\\\.$2(\\\\.[0-9]+)?'), dst.number)
        else dst.uri_sl
    end
where
    dst.status = 'PUBLISHED'
    and (
        dst.canonical_uri is null
        or
        dst.uri is null
        or
        dst.uri_sl is null
    )
;
SET SQL_SAFE_UPDATES = @SQL_SAFE_UPDATES;

There are some other issues, probably in the frontend, which alter the version number before displaying it:

cessda-bitbucket-importer commented 1 year ago

Original comment by Martin Šeleng (GitHub: pakoselo).


First I want to ask @Joshocan to exeute the script on dev and staging (thanks for that), to be able to test it by @‌dolinarm and to address the issue descibed by @Stifo I already noticed the problem with dispalying wrong version(s) and started to work on that.

cessda-bitbucket-importer commented 1 year ago

Original comment by Martin Šeleng (GitHub: pakoselo).


@‌dolinarm @Stifo I have repaired the numbering in the Title. Sections “CVs search” “Editor CVs search”, but the citation, cannonical uri and urn are generated during publishing the CVs as a SL admin, so there is some discrepancy. If later some of the TL admin(s) creates new TL version and next the SL admin publish the new TL version, the URI(s) and URN(s) for already published SL and Tl(s) are not updated, only the corresponding version numbers (there is a patch number, the last of 3 digits updated + 1). I am not sure if they should be updated or not, because they weren’t published as a new versions, they are just updated because of the new TL version. So, right now leave it as it is.

cessda-bitbucket-importer commented 1 year ago

Original comment by Joshua Tetteh Ocansey (GitHub: Joshocan).


thanks @pakoselo Script deployed. @‌dolinarm check and test for functionality in dev and staging.

cessda-bitbucket-importer commented 1 year ago

Original comment by Maja Dolinar.


@pakoselo @Joshocan I tested this on dev and staging and it is working fine, so the problem is solved there. Please move this into production as well.

cessda-bitbucket-importer commented 1 year ago

Original comment by Joshua Tetteh Ocansey (GitHub: Joshocan).


@‌dolinarm @john-shepherdson Need to plan for incremental releases for CVS

cessda-bitbucket-importer commented 1 year ago

Original comment by Martin Šeleng (GitHub: pakoselo).


@Joshocan Can you please provide us (myself and @Stifo ) latest production database dump. To test it once again locally. Also as you suggest we need to plan the incremental release, also without the migrate button in the maintenance section.

cessda-bitbucket-importer commented 1 year ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


See #486

cessda-bitbucket-importer commented 1 year ago

Original comment by Joshua Tetteh Ocansey (GitHub: Joshocan).


@pakoselo @Stifo DB sent to you.

cessda-bitbucket-importer commented 1 year ago

Original comment by Maja Dolinar.


The Canonical URIs are now showing on staging, however, the versioning is not right. It does not correspond to whatever is selected (language or version) and is very confusing.

I had a discussion with Taina about versioning and it should be like this: If later some of the TL admin(s) create a new TL version and next the SL admin publish the new TL version, the URI(s) and URN(s) for already published SL and TL(s) SHOULD BE updated as well since their version number changes (a newly published package includes all the language variants that have up-to-date translations to SL content - a language variant is dropped from a newly published CV version only if the SL has changed and they have not updated for that. If only one TL has changed and SL remains the same, it means all other TLs are still valid and are included in a newly published package).

cessda-bitbucket-importer commented 1 year ago

@MajaDolinar I've took a look on it and found a logic problem that was not discussed before: if everything is published, a new TL is created, reviewed, set ready to be published and a bundle is published (no change in SL, just in that particular TL), then version number of the SL and already published TLs is just updated and previous version is lost. Technically, it is still there, the content remains, but it has overwritten version number. This breaks the track of successive patch versions; e.g.: an SL version 2.1.0 is overwritten by 2.1.1 and 2.1.0 is non-existent. Similarly, 2.1.1 can be overwritten by 2.1.2 and 2.1.1 is non-existent and so on, so there are successive versions 2.0.5, 2.1.9. I suggest to clone already published SL and TLs in this case and update the version number for them as well as URIs. I already have a code for this, but I need to test it a little bit.

Stifo commented 1 year ago

fixed by #526 PR

MajaDolinar commented 1 year ago

@Stifo sorry it took so long to answer to this, I was trying to figure this one out and I had a discussion with Taina again. Here are the clarifications: Currently, the canonical URIs show the version number of the base SL version but this should be changed.

New system:

Previously:

The canonical URI is formed from whatever is entered in the Agency information in the element ‘Canonical Uri’, with the version number added to the end. The version number added should be the whole package version number. Right now the versions in staging in all front-end displays of published vocabularies do not have only one and the same version number everywhere, including a canonical urn, citations, downloads, address lines on top of the page etc across all languages. There should be only one version number everywhere.

Stifo commented 1 year ago

Thanks @MajaDolinar, i did it as described. However, we still need to regenerate the citations and update version numbers in it. Could you please a separate issue for that? The canonicalURI and available from show now correct versions. I'm not sure, when the fix appears in the dev/staging after recent migration to github.

FYI: uri and uri_sl is still present in the db and code