basnum / Basnage

1 stars 2 forks source link

Why use "fre" instead of the standard "fr"or "fra" in the xml:lang attributes. #4

Open pjox opened 5 years ago

pjox commented 5 years ago

Hello,

Looking at the files, I've seen you use xml:lang="fre" instead of xml:lang="fr" or xml:lang="fra". For example in file LettreR_workingFile.xml:

<entry xml:lang="fre" xml:id="rolle">
    <form><orth rendition="#uc">rolle</orth> <gramGrp><pos expand="Substantif">ſubſt.</pos> <gen expand="Masculin">maſc.</gen></gramGrp></form> <note>L'Academie écrit Rôle; et <lb/>c'eſt aini qu'on doit écrire, pour marquer que la <lb/> premiere syllabe eſt longue; ce que l'on marquoit <lb/>autrefois en écrivant <hi rendition="#i">Roolle</hi>.</note> <def>Etat, ou liſte de noms de <lb/> pluſieurs perſonnes qui ſont de même condition, ou <lb/>dans le même engagement.</def>
    <cit type="exemple"><quote>Dès que le nom d'un ſol-<lb/>dat eſt écrit ſur le <hi rendition="#i">rôle</hi>, c'eſt pour lui un crime capital <lb/>de deſerter.</quote></cit>
    <cit type="exemple"><quote>Le Comiſſaire à faire les montres tient <lb/>les <hi rendition="#i">rôles</hi>, arrête les <hi rendition="#i">rôles</hi>.</quote></cit>
    <cit type="exemple"><quote>On appelle les Ouvriers <lb/>dans les ateliers trois fois le jour sur le <hi rendition="#i">rôle</hi>; on les paye ſui-<lb/>vant qu'ils ſont marquez ſur le <hi rendition="#i">rôle</hi>.</quote></cit>
    <etym>Ce mot vient de <foreign xml:lang="lat" rendition="#i">rutulus</foreign> ou <foreign xml:lang="lat" rendition="#i">rotulus</foreign>, qui ſignifie un <hi rendition="#i">rouleau</hi>, <lb/>parce qu'autrefois on rouloit ces <hi rendition="#i">rôles</hi>, &amp; toutes les ex-</etym>
</entry>

I looked it up and this code is normally the 639-2/B way of tagging the French language. The problem is that, in the same entry there is also <foreign xml:lang="lat" rendition="#i">rutulus</foreign> where the code lat is used, which the 639-2, 639-3 way of tagging Latin.

This is just a detail and can be corrected without much effort, however it would be nice to use the same standard for all languages, preferring the ISO 639-3 standard which uses codes like:

  1. fro: Old French (842-ca. 1400),
  2. frm: Middle French (ca. 1400-1600) ,
  3. fra: French,
  4. ang: Old English (ca. 450-1100),
  5. enm: Middle English (1100-1500),
  6. eng: English,
  7. grc: Ancient Greek (to 1453),
  8. ell: Modern Greek (1453-),
  9. lat: Latin,
  10. ron: Moldavian, Moldovan, Romanian.

The complete list is here.

Making this change standardizes the language codes and makes it easier for me to automatize this later.

Thanks! 😄

WGBS2 commented 5 years ago

Not sure where the problem lies. ‘fre' is a standard code for modern French. We also have words in latin, hence ‘latin’. In the etymologies, I have to correct as we now use instead of . is still used outside of etymologies. Codes are 639-2 in which ‘fre’ and ‘lat’ exist, as does ‘gre’ and ‘grc' for Greek, which are shortcuts as I do not know which Greek we are using. When I tidy up, I’ll speak to my tame Greek etymologist, who has agreed to work on Greek examples.

G

Le 5 juil. 2019 à 12:50, Pedro J. Ortiz notifications@github.com a écrit :

Hello,

Looking at the files, I've seen you use xml:lang="fre" instead of xml:lang="fr" or xml:lang="fra". For example in file LettreR_workingFile.xml:

rolle ſubſt. maſc.
L'Academie écrit Rôle; et c'eſt aini qu'on doit écrire, pour marquer que la premiere syllabe eſt longue; ce que l'on marquoit autrefois en écrivant Roolle. Etat, ou liſte de noms de pluſieurs perſonnes qui ſont de même condition, ou dans le même engagement. Dès que le nom d'un ſol-dat eſt écrit ſur le rôle, c'eſt pour lui un crime capital de deſerter. Le Comiſſaire à faire les montres tient les rôles, arrête les rôles. On appelle les Ouvriers dans les ateliers trois fois le jour sur le rôle; on les paye ſui-vant qu'ils ſont marquez ſur le rôle. Ce mot vient de rutulus ou rotulus, qui ſignifie un rouleau, parce qu'autrefois on rouloit ces rôles, & toutes les ex-

I looked it up https://iso639-3.sil.org/code/fre and this code is normally the 639-2/B https://iso639-3.sil.org/code/fre way of tagging the French language. The problem is that, in the same entry there is also rutulus where the code lat is used, which the 639-2, 639-3 https://iso639-3.sil.org/code/lat way of tagging Latin.

This is just a detail and can be corrected without much effort, however it would be nice to use the same standard for all languages, preferring the ISO 639-3 https://iso639-3.sil.org/ standard which uses codes like:

fro: Old French (842-ca. 1400), frm: Middle French (ca. 1400-1600) , fra: French, ang: Old English (ca. 450-1100), enm: Middle English (1100-1500), eng: English, grc: Ancient Greek (to 1453), ell: Modern Greek (1453-), lat: Latin, ron: Moldavian, Moldovan, Romanian. The complete list is here https://iso639-3.sil.org/code_tables/639/data.

Making this change standardizes the language codes and makes it easier for me to automatize this later.

Thanks! 😄

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGBS2/Basnage/issues/4?email_source=notifications&email_token=AD63DP7BREYAMLMPVMSVDGTP54RPXA5CNFSM4H6KGXI2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5Q7MVQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AD63DPZRHVJ42IFPIO4EWE3P54RPXANCNFSM4H6KGXIQ.

pjox commented 5 years ago

The problem is that fre is an ISO 639-2/B code, and most of the python libraries I use support either ISO 639-1, ISO 639-2/T or ISO 639-3. So I would always have to change fre to fra or to fr. This is not difficult, but it would be convenient if we all "speak" the same standard (preferably ISO 639-3 which is the standard right now).

Also as stated on Wikipedia:

B and T codes

While most languages are given one code by the standard, twenty of the languages described have two three-letter codes, a "bibliographic" code (ISO 639-2/B), which is derived from the English name for the language and was a necessary legacy feature, and a "terminological" code (ISO 639-2/T), which is derived from the native name for the language and resembles the language's two-letter code in ISO 639-1. There were originally 22 B codes; scc and scr are now deprecated.

In general the T codes are favored; ISO 639-3 uses ISO 639-2/T.

So the ISO 639-2/T which is fra is compatible with the ISO 639-3 which is also fra. The ISO 639-2/B fre is not compatible with any other standard.

WGBS2 commented 5 years ago

Hi Pedro,

I am worried about standards that change whenever certain people get an itch. Very many projects have been using 639-2. I use a three character code for all languages so as to be consistent. If the new standard is ‘fra’, thereby removing the choice, then so be it. Let’s go for 639-3, but we’ll have toi change things everywhere.

Our Github works fine, but I’ll look at organisation when i get time. Now that Github has been brought up by the enemy, I am much more careful.

Best wishes

Geoffrey

Not oin holiday, but not much online either. I’ll shut down totally when the granddaughters arrive at the end of the month.

Le 5 juil. 2019 à 14:23, Pedro J. Ortiz notifications@github.com a écrit :

The problem is that fre is an ISO 639-2/B code, and most of the python libraries I use support either ISO 639-1, ISO 639-2/T or ISO 639-3. So I would always have to change fre to fra or to fr. This is not difficult, but it would be convenient if we all "speak" the same standard (preferably ISO 639-3 which is the standard right now).

Also as stated on Wikipedia https://en.wikipedia.org/wiki/ISO_639-2#B_and_T_codes:

B and T codes

While most languages are given one code by the standard, twenty of the languages described have two three-letter codes, a "bibliographic" code (ISO 639-2/B), which is derived from the English name for the language and was a necessary legacy feature, and a "terminological" code (ISO 639-2/T), which is derived from the native name for the language and resembles the language's two-letter code in ISO 639-1. There were originally 22 B codes; scc and scr are now deprecated.

In general the T codes are favored; ISO 639-3 uses ISO 639-2/T.

So the ISO 639-2/T which is fra is compatible with the ISO 639-3 which is also fra. The ISO 639-2/B fre is not compatible with any other standard.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGBS2/Basnage/issues/4?email_source=notifications&email_token=AD63DP6ECCW7QF75IDQQGWTP544LVA5CNFSM4H6KGXI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZJMPYI#issuecomment-508741601, or mute the thread https://github.com/notifications/unsubscribe-auth/AD63DP45AVBBM5FMFBO2OXTP544LVANCNFSM4H6KGXIQ.