Closed HassanAkbar closed 7 months ago
It is mentioned here https://github.com/geolexica/jekyll-geolexica/issues/12#issuecomment-1662015937
I think this makes sense:
- start supporting authoritativeSource,
- support current Glossarist 2 model in glossarist-ruby,
- switch jekyll-geolexica to use glossarist-ruby with thorough testing on existing website repositories to ensure no undesired changes happen when they are later deployed.
In that order…
@opoudjis @ronaldtse Do we want to use Glossarist model v2
in glossaries-ruby
for this or can we start working on this with v1
?
@HassanAkbar we want to migrate all files to Glossarist model v2. Can we do it for all our existing repositories?
@ronaldtse Currently glossarist-ruby
does not support Glossarist model v2
and if we are migrating every repo to v2 then we need to update glossarist-ruby
as well and currently I am not sure what V2 is.
@strogonoff I see that you have worked on updating the isotc211-glossary
to V2. Can you help me understand the structure of V2 glossary?
In this case there is no v2 and we should get this working with v1!
@ronaldtse @strogonoff Currently the glossary for isotc211-glossary
is in V2 and glossary for osgeo-glossary
is in V1 so if we go with V1 then isotc211
will break.
We can update the Jekyll-geolexica
version and use the Glossarist model V2
in the updated version and for the sites that are using Glossarist model V1
we can keep using the old version of Jekyll-geolexica
.
@ronaldtse The important thing is that we will drop the support of Glossarist model V1
in Jekyll-geolexica
. We won’t be able to make any changes in the older version of Jekyll-geolexica
and we should be prioritizing the update of sites to use V2 to get any modifications/features/bugs done.
The important thing is that we will drop the support of Glossarist model V1 in Jekyll-geolexica
That's fine to me because we control all those repositories right now. We should bring all of those repositories up to date as soon as possible.
@ronaldtse @strogonoff I have a few questions related to Glossarist model V2
,
In isotc211-glossary
the localized-concept
0033e1ef-60b2-558f-9d0f-b882b4f7da75 has a date type accepted
in the data
and a dateAccepted
outside the data at the end here.
Both of these have different dates, what is the difference between both the dates and do we need to store them separately inside glossarist? i.e concept.data.dates
and concept.dates
.
review_decision_date
and review_date
go inside the dates with types reviewe_decision
and reviewed
respectively or are these different?review_decision_notes
in notes
with type review_decision
?data
is supposed to conform to Glossarist model. dateAccepted
, id
, etc. is not part of data
.data
contains extra fields that should not be there (like review*
fields).In more detail:
data
contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.
review*
fields are not supposed to be theredates
list is not supposed to be theredateAccepted
.dateAccepted
).
In case of that YAML,
data
contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.
review*
fields are not supposed to be theredates
list is not supposed to be there
I can update the script to delete excess data, as for above.
- It probably also outputs wrong
dateAccepted
.
@strogonoff are we talking about the output of a concept
or of a localized-concept
? In the case of the first, we are setting its value to a dummy date that we retrieve from config.py. In the case of the latter, we retrieve its value from the data itself, if present, or use the same default value as for the first, if not.
How could this be improved?
Just for the record, we have 2 families of models:
In this case, it happens that the Glossarist dataset is managed by a Register. This means that every Glossarist Concept is also a Register Concept (in the new ISO 19135 under development, but in the old version currently it is a Register Item).
It happens that in ISO/TC 211, they use the old Register model which means that every Concept is a Register Item, and that each concept in the MLGT (the content on isotc211.geolexica.org) is accompanied by some status dates such as "approval date" (and this content is at both the general concept level and the localized concept level).
In an ideal world, the data for the Glossarist models (data content) is separate from the Register models (administrative content). The Register models can refer to the Glossarist models, of course and vice versa. This way we could use different parsers/models accessors to work with the data:
@ronaldtse So, for Glossarist
we should only read the data inside the data
key and discard other keys in the yaml file
.
Also I think we should discard the Register data in Glossarist
.
Should we create a separate gem for that or is there an existing gem that we can use?
As mentioned by @strogonoff
In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.
review*
fields are not supposed to be theredates
list is not supposed to be there
@ronaldtse One more question related to this, Should I assume that these will be fixed in isotc211-glossary
or should I add these fields temporarily in Glossarist
?
As mentioned by @strogonoff
In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.
review*
fields are not supposed to be theredates
list is not supposed to be there
If I get green light on this one, I can update the script to fix the data structure, removing excess data fields. It's straightforward, won't take long.
I am lost in this thread. What is still pending?
The goal here is to synchronize the YAML structures for the Glossarist Ruby gem (used by jekyll-geolexica) and the Glossarist plugin.
This means we need to update all the data sets to the latest structure. That's it.
In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.
review*
fields are not supposed to be theredates
list is not supposed to be thereIf I get green light on this one, I can update the script to fix the data structure, removing excess data fields. It's straightforward, won't take long.
@ronaldtse just want to confirm that do we need to update this in glossarist or fix the data structure?
@HassanAkbar so the tricky thing here is about the latest MLGT data which is done using this gem: https://github.com/geolexica/tc211-termbase .
The point is actually to upgrade the tc211-termbase
gem to use the glossarist
gem.
The input data for the gem is the XSLX file, and the output is a Glossarist YAML ConceptCollection.
@ronaldtse Let me summarize what’s going on here.
As I had no idea of Glossarist model V2 , to understand it @strogonoff suggested to take a look at paneron-extension-glossarist/models/concepts.ts.
While discussing about the format @strogonoff explained that review*
and dates
fields do not belong in the model here https://github.com/geolexica/jekyll-geolexica/issues/14#issuecomment-1784539896.
As these fields don't belong to Glossarist model, I believe we should let @stefanomunarini update the generation script so that the data in isotc211-glossary can be corrected. @stefanomunarini Can you help with that ?
As these fields don't belong to Glossarist model, I believe we should let @stefanomunarini update the generation script so that the data in isotc211-glossary can be corrected. @stefanomunarini Can you help with that ?
Sure, I've pushed a commit. You can now re run the script to update the data @HassanAkbar
@stefanomunarini I was looking at the isotc211-glossary and it seems like you updated the concepts last time.
Can you let me know the steps needed to generate the isotc211-glossary
?
Hi @HassanAkbar please review and merge this PR https://github.com/geolexica/isotc211-glossary/pull/44
The content of isotc211-glossary is created by the tc211-termbase gem, which took the XLSX file and processed it into the old Glossarist YAML. Once I get back to the computer I’ll provide you with documentation.
@HassanAkbar the tc211-termbase gem is updated at https://github.com/geolexica/tc211-termbase/pull/31 , can you now:
Thanks.
@ronaldtse can you let me know from where can I get the xlsx
file for generating the concepts?
@ronaldtse I think I don't have access to https://github.com/ISO-TC211/mlgt-data repo, can you help with that?
@ronaldtse just saw this issue -> authoritativeSources in localizedConcepts YAML are empty objects,
Currently there is no support for authoritativeSources
in glossarist and as we are using it to generate concepts in tc211-termbase
, the output files does not have a authoritativeSources
key in localized-concepts
.
Should we add this in tc211-termbase
or should I run a separate script after concepts generation is completed?
@HassanAkbar yes we should add them in tc211-termbase
. Previously there were sources
in the generated output, I don't know where they have gone.
@HassanAkbar here's the file:
@ronaldtse I've updated the glossary using the above file in this PR -> https://github.com/geolexica/isotc211-glossary/pull/47
I have a couple of questions related to the generated concept files
localized_concepts
because it is generated using the glossarist and we use snake case convention in glossarist, while in the previous version the key was in camel casing i.e localizedConcepts
. So what should I use now?data
key) is not being added because it is not handled in glossarist so should I add that using a script or should I add this functionality in isotc211-termbase
repo?@ronaldtse I've updated the glossary using the above file in this PR -> geolexica/isotc211-glossary#47
I have a couple of questions related to the generated concept files
- Currently in the concept files the key for localized concepts is
localized_concepts
because it is generated using the glossarist and we use snake case convention in glossarist, while in the previous version the key was in camel casing i.elocalizedConcepts
. So what should I use now?
Use the Glossarist gem convention because jekyll-geolexica also uses the Glossarist gem. Correct?
- Also the register information(info outside the
data
key) is not being added because it is not handled in glossarist so should I add that using a script or should I add this functionality inisotc211-termbase
repo?
This should be added in the tc211-termbase
gem so we can display them in isotc211.geolexica.org.
Use the Glossarist gem convention because jekyll-geolexica also uses the Glossarist gem. Correct?
@ronaldtse Currently it is not using glossarist gem. I will update jekyll-geolexica
next to read the concepts using glossarist
.
we should use
glossarist
for readingconcept yaml
files as mentioned here -> https://github.com/geolexica/jekyll-geolexica/issues/12#issuecomment-1662015937