cessda / cessda.cvs.two

Apache License 2.0
0 stars 2 forks source link

Strange versioning of GESIS Central concepts of R|EX research #654

Closed MajaDolinar closed 2 months ago

MajaDolinar commented 1 year ago

GESIS vocabulary https://vocabularies.cessda.eu/vocabulary/KategorienschemaWahlstudien versioning is behaving strangely. It should probably be SL in German and TL in English. As noted in https://github.com/cessda/cessda.cvs.two/issues/607#issuecomment-1541413596: "Probably it should be originally in German, but the English was ahead so it was aligned as 1.0.2, but if we follow the new versioning paradigm, probably it should be 1.1.0."

I will ask Gesis what the SL in this case is for them and what their workflow for entering this vocabulary was.

MajaDolinar commented 1 year ago

I asked Gesis about their workflow in entering this vocabulary. It is as we suspected: "The source language was German And we followed the workflow you described below. Once we have added all the codes, we set the CV to ‘ready to publish’ and then the translation followed. After that the vocabulary was published"

Apparently, when you create a completely new vocabulary and not publish it - only put to Ready to translate status, the publication after the added translations is set to 1.0.0 and not 1.0.1 (accounting for the translations).

pakoselo commented 1 year ago

@MajaDolinar I am not sure that I follow the discussion. Just to summarize: the problem is that the SL should be German and when we create new SL, it should add the minor number (2nd digit), not the patch number (3rd digit). Am I right? I will add @Stifo to this discussion as he was the one implementing 3 digit versioning system. Also I had a test on the staging and if I am creating new SL it will add correctly only the major.minor pair (user should design that) As I said, we should get rid off the 1.0.1 version at all.

pakoselo commented 1 year ago

@MajaDolinar @Stifo probably we need to hack the database once again.

MajaDolinar commented 1 year ago

Before doing anything, let me first test this a bit on staging. The problem as I see it is that if a someone creates a completely new vocabulary in SL and only sends it to "Ready to be translated status" and not publishes it prior to that, then the translations that will be created - so the entire published package at the end will receive 1.0.0 So in this case it means that they created a new vocabulary in German (SL), put it on "Ready to be translated" status, created an English translation to it and then published it as a package. This created the version 1.0.0 which according to the rules we have means that there is only SL and no translations, however, there is an English translation included now in 1.0.0 (which should be then 1.0.1). It basically creates a confusion for the end-user and does not follow the rules we set up. Hope I am explaining it so that it is understandable.

Stifo commented 1 year ago

update: Ah, now i see, I think there should be TL en in the bundle. Please, let me check...get back later.

I'm not sure I did understand the problem either... I can't see a problem. I think it's correct. Please, let me explain my view. I've look on how the versions were created chronologically.

This is how it was before migrating to the new versioning system:

id item_type language number status
520 SL de 1.0 PUBLISHED
573 TL en 1.0.1 PUBLISHED
578 TL en 1.0.2 DRAFT

This is after migrating to the new versioning system; i.e. current status:

id item_type language number status
520 SL de 1.0 PUBLISHED
573 TL en 1.0.1 PUBLISHED
578 TL en 1.0.3 DRAFT
732 SL de 1.0.2 PUBLISHED

What the version number migration did, was aligning all the published versions in a common bundle giving it a three digit number; i.e., 1.0.2, because: a) the latest published SL was de-1.0 b) the latest published TL was en-1.0.1 c) the next available number for publishing was 1.0.2

The minor number is changed only if there is a change in SL, which was not this case. Therefore the minor number remained 1.0. The change of the en TL's version number from 1.0.2 to 1.0.3 was because this version was still a DRAFT and it couldn't be included in the 1.0.2 bundle after migration. It was set to 1.0.3 as it was being prepared for the next publication.

MajaDolinar commented 1 year ago

Ah I see the issue now, I was referencing the wrong vocabulary. Yes this one that you are referring to is correct, so there is no issue with https://vocabularies.cessda.eu/vocabulary/KategorienschemaWahlstudien.

The problem is actually this vocabulary that has the issue I was explaining above: https://vocabularies.cessda.eu/vocabulary/rexconcepts?lang=en

So sorry for the confusion in this..

pakoselo commented 1 year ago

@MajaDolinar @Stifo so probably it doesn't work if we create first version. So it doesn't increase the patch number if there is no Sl published. What do you think @Stifo ?

Stifo commented 1 year ago

https://vocabularies.cessda.eu/vocabulary/KategorienschemaWahlstudien

I've noticed, that the 1.0.2 bundle is missing the en TL in it. There should have been a copy of the published en TL 1.0.1 put in the 1.0.2 bundle. I should look, whether there are more such cases in other vocabularies. I'll prepare a fix for that.

https://vocabularies.cessda.eu/vocabulary/rexconcepts?lang=en

please correct me, but at the time of implementing the new versioning system, there was no such explicit rule that 1.0.0 means "SL alone in the bundle". therefore, 1.0.0 could have also TLs included.

MajaDolinar commented 1 year ago

Yes, please create a for the Kategorienshcema Wahlstudien.

For the Rex Concepts and all future CVs with this workflow, it should be fixed to forcing a double publish:

  1. publish only the SL with 1.0.0
  2. publish the translation(s), increasing the versioning to 1.0.1
Stifo commented 1 year ago

https://vocabularies.cessda.eu/vocabulary/KategorienschemaWahlstudien

following code is under testing tested on dev

@MajaDolinar, please check the result on vocabularies-dev.cessda.eu, thanks. Also #655 is fixed, which you can check as well. Thanks

Insert missing en-TL-1.0.2 between en-TL-1.0.1-PUBLISHED and en-TL-1.0.3-DRAFT; i.e., clone en-TL-1.0.1 with concepts into 1.0.2 bundle and re-link versions+concepts:

LOCK TABLES version concept WRITE;
START TRANSACTION;

-- 1/7
-- clone en-TL-1.0.1, which is missing in the published 1.0.2 bundle as en-TL-1.0.2-PUBLISHED
insert into version (
    canonical_uri,
    citation,
    creator,
    ddi_usage,
    definition,
    discussion_notes,
    initial_version,
    item_type,
    language,
    last_modified,
    license,
    license_id,
    notation,
    notes,
    number,
    previous_version,
    publication_date,
    publisher,
    status,
    title,
    translate_agency,
    translate_agency_link,
    uri,
    uri_sl,
    version_changes,
    version_notes,
    vocabulary_id,
    last_status_change_date,
    creation_date
) select 
    'urn:ddi:int.gesis.org:KategorienschemaWahlstudien:1.0.2',
    'GESIS. (2022). Kategorienschema Wahlstudien (Version 1.0.2) [Controlled vocabulary]. CESSDA. urn:ddi:int.gesis.org:KategorienschemaWahlstudien:1.0.2.',
    creator,
    ddi_usage,
    definition,
    discussion_notes,
    573,
    item_type,
    language,
    last_modified,
    license,
    license_id,
    notation,
    notes,
    '1.0.2',
    573,
    publication_date,
    publisher,
    status,
    title,
    translate_agency,
    translate_agency_link,
    'https://www.gesis.org/vocabulary/KategorienschemaWahlstudien/1.0.2/en',
    'https://www.gesis.org/vocabulary/KategorienschemaWahlstudien/1.0.2/de',
    version_changes,
    'To align with a new versioning system, this version has been automaticaly created by copying the previous version PUBLISHED-1.0.1.',
    vocabulary_id,
    last_status_change_date,
    creation_date
from
    version
where
    id=573;

-- 2/7
-- retrieve the id of the en-TL-1.0.2-PUBLISHED version
set @new_id = LAST_INSERT_ID();

-- 3/7
-- set previous version of the en-TL-1.0.3-DRAFT to en-TL-1.0.2-PUBLISHED
update version set previous_version=@new_id where id=578;

-- 4/7
-- set initial_version
update version set initial_version=573 where id in (573,578);

-- 5/7
-- clone concepts from en-TL-1.0.1-PUBLISHED to en-TL-1.0.2-PUBLISHED
insert into concept (
    definition,
    notation,
    parent,
    position,
    previous_concept,
    sl_concept,
    title,
    uri,
    version_id,
    deprecated,
    introduced_in_version_id,
    replaced_by_id,
    valid_until_version_id
) select
    definition,
    notation,
    parent,
    position,
    id,
    sl_concept,
    title,
    'https://www.gesis.org/vocabulary/KategorienschemaWahlstudien_[CODE]/1.0.2/en',
    @new_id,
    deprecated,
    introduced_in_version_id,
    replaced_by_id,
    valid_until_version_id
from
    concept
where
    version_id=573;

-- 6/7
-- correct previous concept ids for the en-TL-1.0.3-DRAFT
update
    concept as dst
left join
    concept as src
on
    src.notation=dst.notation
set
    dst.previous_concept=src.id
where
    src.version_id=@new_id
    and
    dst.version_id=578;

-- 7/7
-- copy en-TL-1.0.3-DRAFT with concepts in order to have the most recent ids and remove its originals
insert into
    version(canonical_uri,citation,creator,ddi_usage,definition,discussion_notes,initial_version,item_type,language,last_modified,license,license_id,notation,notes,number,previous_version,publication_date,publisher,status, title,translate_agency,translate_agency_link,uri,uri_sl,version_changes,version_notes,vocabulary_id,last_status_change_date,creation_date)
select
    canonical_uri,citation,creator,ddi_usage,definition,discussion_notes,initial_version,item_type,language,last_modified,license,license_id,notation,notes,number,previous_version,publication_date,publisher,status, title,translate_agency,translate_agency_link,uri,uri_sl,version_changes,version_notes,vocabulary_id,last_status_change_date,creation_date
from version where id=578;
set @new_draft_id = LAST_INSERT_ID();
insert into
    concept(definition,notation,parent,position,previous_concept,sl_concept,title,uri,version_id,deprecated,introduced_in_version_id,replaced_by_id,valid_until_version_id)
select
    definition,notation,parent,position,previous_concept,sl_concept,title,uri,@new_draft_id,deprecated,introduced_in_version_id,replaced_by_id,valid_until_version_id
from concept where version_id=578;
delete from concept where version_id=578;
delete from version where id=578;

COMMIT;
UNLOCK TABLES;
MajaDolinar commented 10 months ago

Closing, resolved.

Stifo commented 2 months ago

Apparently, the fixing SQL script:

https://github.com/cessda/cessda.cvs.two/issues/654#issuecomment-1561192390

was not executed over the production database and the problem still persists there. In the dev, it is fixed.

@matthew-morris-cessda: could you please run it in the prod? please, make a backup first. thanks.

matthew-morris-cessda commented 2 months ago

I've run the statement against the production CVS instance

Stifo commented 2 months ago

thanks @matthew-morris-cessda !