geolexica / isotc211.geolexica.org

ISO/TC 211 online version of the Multi-Lingual Glossary of Terms
https://isotc211.geolexica.org
4 stars 2 forks source link

enhancement: background about abbreviated terms and symbols; and a possible "near-term" solution... #196

Open ReesePlews opened 1 year ago

ReesePlews commented 1 year ago

20230629: this is a discussion about how abbreviated terms and symbols are handled in the current TMG repository. it provides information on how we may want to handle them when we work with the terminology register; this is background for a future discussion/enhancement;

in the current geolexica (based on content from ISO/TC 211's multi-lingual glossary of terms -MLGT) there is no option to view abbreviated terms and symbols. in some cases these accompany the terminological entry but in many others, they are found separately either at the end of clause 3 or commonly in clause 4. presently, abbreviated terms and symbols are not provided to the NB members for translation, so that is the main reason why they are not shown in the MLGT.

the TMG repository (the main terminology register of TC211) collects this content but does not manage it as rigorously as the concepts and definitions. in the past drafts of both abbreviated terms and symbols were placed in the the excel spreadsheet. however, this has become too difficult because of a) time consuming to deal with, b) symbols in some recent standards are quite complex and cannot be shown in excel.

as we are planning to move towards a SMART Register for terminology, then at some point a solution will be needed to deal with the abbreviated terms and symbols. these could be handled as a "simplified" terminological entry or they could be de-coupled and handled separately (however that may require an additional management method). a "simplified" type of entry may be the easiest solution.

with the forthcoming (July 2023) terminology spreadsheet release, i have updated the abbreviated terms in the TMG repository but the symbols cannot be updated due to complexity and lack of handling in excel.

20230629: i dont know if a "near term" solution is needed or if we should spend time on that if a better solution could be developed;

as an immediate "near term" solution, is there a way to create a separate page on a geolexica site with two tables where the abbreviated terms and symbol content from the published documents could be displayed? if so, would that table be created statically or dynamically? and what type of input file structure would be needed. math notation like that used in geolexica is also needed for the symbols table. no search or filter functionality is needed. the user can search via a simple "browser find". however the "style" of the tables should match the overall look of the "geolexica" site and a link to "Abbreviated terms and symbols" page needs to be added to the main page interface.

if you can tell me the way to create the tables or file format, i will do that. thank you.

ReesePlews commented 1 year ago

continuing with some background

the TMG repository has two spreadsheets, abbreviated terms and symbols

abbreviated terms (last updated 2023-06) tmg_repo_2023-06_abbreviated_terms

there is no real linking between the other "worksheets" in the TMG repo and the abbreviated terms worksheet; these terms may be present in the entry, in which case they appear in the main terminology section (which has an abbreviation field), but the terms in this worksheet are typically from the "acronyms and abbreviated terms" clause in the document. as for linking, if the TMG repo was a "true" data base it would have been better to use the "document" id (from the document register worksheet) instead of all the individual fields about the document, but this sheet was made for "human visualization". the red records indicated earlier draft documents. from 2023-06 i do not plan to process abbreviated terms until they are published. at this point, or during a conversion, i do not care what happens to the past records (red rows). going forward could be different.

symbols (last updated 2022-06) tmg_repo_2022_symbols

the symbols worksheet has a different structure than the abbreviated terms worksheet; i am not sure of the purpose of all attributes but we can learn a bit more by looking at the page from the standard (below); looking at row 21 in the symbols image above, we see "j" (col C - symbol) and then "i, j, [k]" (col E - symbol group); then compare how this looks in the actual standard document shown below; "j" has been "pulled out" of the symbol shown in the standard and described (col D) separately in this spreadsheet; col E and col F are identical to the contents of the standard; i do not know if we want to "separate out each symbol from an equation and describe them/it separately; this would be something we need to discuss;

col G appears to be from the title of the document to add subject context to each entry. it could be nice as a filter option; cols H, I and J are not linked to the "document register spreadsheet" (if the TMG repository was a real database, then they could be linked from there).

col K and L are duplicates taken from the main terminology clause; examine rows 22 and 23 in the image; these correspond to "a" semi-major axis and "b" semi-minor axis which are also listed as "abbreviations" to their individual terminological entries in clause 4 (this standard is old and the clause order is different). [note the sorting order of the spreadsheet and the sorting order of the standard are not matching; the sorting order of the spreadsheet is not fixed]; content from col K and L are duplicates from regular terminological entries, i dont know if we want to go that direction, however they are listed again in the symbols clause; current ISO Directives may prohibit that duplication, i am not sure.

tmg_repo_2022_symbols_from_19111-2007

ReesePlews commented 1 year ago

i am not sure how we want to handle this content;

if we are thinking about a terminology register (like the TMG repo) then there should be a single source management solution; if management in paneron then these would be different than a regular terminology entry; perhaps paneron already supports this, i am not sure.

either way, supporting the viewing/discovery of these would require modifications to the geolexica framework (viewing, searching, etc)

happy to discuss more, when it comes time to make some implementation plans.

strogonoff commented 1 year ago

@ReesePlews off the top of my head, hopefully I’m not too off base in my understanding of this issue:

ReesePlews commented 1 year ago

@strogonoff thank you for the reply.

your understanding is correct. i saw the "designations" when you were showing something about paneron last week. as long as we can have "stand-alone" (meaning outside of a terminological entry) symbols and abbreviated terms, then i am sure we are good.

if this support is already implemented in paneron that is good. the render in geolex will require some reworking/enhancements, i think people will still want "symbols and abbreviated terms" to be rendered separate from any "regular" terminological entry. how that interface operates, etc, is subject for a later discussion.

i have not said much about the connection of geolexica and metanorma. you may know about ISO 19173 SMART Terminology for geographic information. in the distant future when that project has finished we will have a real-time dynamic SMART terminology register (future versions of geolexia and paneron) that is officially recognized by ISO where all of TC 211's terminology (draft and published) are in that register and we want metanorma to be able to render the published terms from that register into either a complete vocabulary standard with all TC211 terms, syllables and abbreviated terms or only those from a specific standard when selected by the user as an "offline render" from the SMART terminology register.

one other critical item, in my opinion, is the ability for bulk loading entries, into either paneron and/or geolex without the need of requesting assistance from a developer... meaning that the format(s) should be simple enough for a non-developer to create.. there could be a simple format with just flat entries and then a more complex format supporting xreferences between common/used terms; such a format should also work for symbols / abbreviated terms... so i guess basically any "designation" that is defined should be supported. this need is not only for terminology but also any register.

anyway just some items to consider as things move forward. thank you for your support.

ReesePlews commented 1 year ago

see also #87 opened earlier