glossarist / iev-data

1 stars 1 forks source link

IEV data treatment: 151-15-01 #99

Open ronaldtse opened 3 years ago

ronaldtse commented 3 years ago

@ronaldtse This is really inconsistent… https://gist.github.com/skalee/e281031db37a1a08941f04ec1a7721af And I'm pretty sure I'll find even more when I improve the SQL query.

"AC" term is quite interesting — qualifier in English or Polish whereas abbreviation in Portuguese or Spanish:

151-15-01        ar       تيار متردد (متناوب)
151-15-01        de       Wechselstrom…                                                 in Zusammensetzungen
151-15-01        en       AC                                                            qualifier
151-15-01        es       AC, calificativo
151-15-01        fi       AC
151-15-01        fr       AC                                                            qualificatif
151-15-01        it       AC; c.a.; corrente alternata
151-15-01        ja       交流
151-15-01        ko       교류                                                            수식어
151-15-01        pl       AC (kwalifikator)
151-15-01        pt       CA (abreviatura)
151-15-01        ru       АС, обозначение
151-15-01        sr       AC                                                            квалификатор
151-15-01        sv       vs-
151-15-01        zh       交流

Originally posted by @skalee in https://github.com/glossarist/iev-data/issues/94#issuecomment-765680069

ronaldtse commented 3 years ago

Sent to IEC.


We just came across 151-15-01, “AC”, which does not seem well aligned across languages.

In English/French/Polish/Korean/Serbian it is a “qualifier”. In Portuguese/Spanish it is an “abbreviation”.

This seems to be a more general issue than assigning correct attributes to the right places because there are so many languages and alternative treatments in play.

I wonder if it is possible to arrive at a harmonized semantic representation:

The good news is there aren’t too many of these instances. The only way is to go through them one by one…

ronaldtse commented 3 years ago

From IEC:

Another legacy issue [...] a need for TC 1 to put its foot down.

According to the IEC Supplement, “AC” is an adjective (replaces qualifier) and an abbreviated form, but “(abbreviation)” is no longer indicated in the Electropedia: image003 and so “abreviatura” is obsolete.

But some NCs want to put what they call “usage information”, e.g. “in Zusammensetzungen” image005 and I guess that “обозначение” would also fall in this category.

Again, to resolve the problem we need to get TC 1 to do a CR just on this topic but [no] time at present.

So this is to be dealt with in IEC/TC 1 some time in the future.

This also means we need to:

FYI, on a personal level, I would agree with “- some attributes are applied to the concept (ones that apply across all languages)“ but I would not agree with “- some attributes are applied to the per language “designation”” since in my experience, an ideal “international designation” shall not be language specific, in the same way that a mathematical abbreviated term should not be:

image009 [ISO/IEC Directives, Part 2, Annex B]

The ISO/IEC Directives, Part 2, Annex C (normative) Designation of internationally standardized items, does not mention this and at present I cannot remember if it ever did. I seem to remember that the optional “Description block” is quite frequently language dependent but that the “Identity block” is not: image010

We will have to clarify what this does.

If you have a list of the instances [...] would appreciate receiving a copy [...].

@skalee could you help generate a list of these inconsistent terms so that IEC/TC 1 can deal with them? Thanks!

ronaldtse commented 3 years ago

My response to IEC:


but I would not agree with “- some attributes are applied to the per language “designation”” since in my experience, an ideal “international designation” shall not be language specific, in the same way that a mathematical abbreviated term should not be:

In the case of 151-15-01, the Portuguese abbreviation is “CA”, and Italian also contains “c.a.” (in addition to AC).

It would in my point of view be overcorrecting to require all languages to use the same abbreviated terms. This is demonstrated in the case of the abbreviation IEC itself, where CEI is also used.

The ISO/IEC Directives, Part 2, Annex C (normative) Designation of internationally standardized items, does not mention this and at present I cannot remember if it ever did. I seem to remember that the optional “Description block” is quite frequently language dependent but that the “Identity block” is not:

I believe this deals with the “identification identifier” instead of the actual “designation” (in the definition of ISO 10241-1, i.e. a “term”).

For example, the IEV can be considered:

description block: IEC identity block (standard number): 60500 identity block (item): 151-15-01

In the current IEV, each entry is considered a “concept” — which is language independent.

And if we wanted to add the revision identifier, it could be also done, since the “item block” can be defined per standard (and there is no standard for identity here):

description block: IEC identity block (standard number): 60500 identity block (item): 151-15-01:2017-10

It is possible to further identify the per-language term within an IEV entry, such as:

description block: IEC identity block (standard number): 60500 identity block (item): 151-15-01:eng:2017-10

There does not seem to be any conflict around this, as long as there is some specification that identifies this.

This brings back a discussion we had two years ago regarding how to cite an IEV term, back then “IEC 60500-PPP:yyyy, XXX-XX-XXX” was the correct pattern.

Maybe this is a question to be raised again.

skalee commented 3 years ago

This also means we need to:

  • separate "usage information" from the designation

Do I understand correctly that this "usage information" describes designation, not localized concept?

Anyway, usage info is already present in YAMLs, for example:

fra:
  id: 102-01-04
  terms:
  - type: expression
    normative_status: preferred
    designation: sous-ensemble
    gender: m
    plurality: singular
  - type: expression
    usage_info: d'un ensemble
    designation: partie
    gender: f
    plurality: singular

  • the "abbreviation" indication needs to move into a designation attribute.

Okay. We have it in our concept model already, we just need to produce the correct data. Feature request already exists, though information there may be outdated a bit: #8.

Also, do we support abbreviation vs acronym vs initialism difference? We have something like that in our concept model diagrams:

Zrzut ekranu 2021-01-30 o 03 26 11

@skalee could you help generate a list of these inconsistent terms so that IEC/TC 1 can deal with them? Thanks!

I guess so. If you need something quick, here is the result set produced a few days ago: https://gist.github.com/skalee/e281031db37a1a08941f04ec1a7721af, but probably this list isn't complete yet. Extracted to #107.

skalee commented 3 years ago

This also means we need to:

  • separate "usage information" from the designation

Do I understand correctly that this "usage information" describes designation, not localized concept?

Anyway, usage info is already present in YAMLs

With one reservation though: only < and > character are recognized, whereas there are other possibilities too (e.g. https://util.unicode.org/UnicodeJsps/character.jsp?a=3008 in CJK). We need to support or normalize that.

ronaldtse commented 3 years ago

Do I understand correctly that this "usage information" describes designation, not localized concept?

It took me a while to understand what you said, but I think you are correct. "usage information" seems to apply to the designation (the "term"), and "domain" applies to the localized concept (the "definition").

This is a great question and I have sought IEC for clarification.

Also, do we support abbreviation vs acronym vs initialism difference?

Yes we should.

With one reservation though: only < and > character are recognized, whereas there are other possibilities too

Good catch, thanks!!

skalee commented 3 years ago

@ronaldtse what is "domain"? I am not aware of that, unless you mean first three digits in IEV ref.

skalee commented 3 years ago

@ronaldtse ping