Closed Taivo55 closed 4 years ago
@Taivo55 -- thanks! Fixed via PR:
https://github.com/phoible/dev/pull/261
I'm starting to wonder if these are due to changes in Glottolog/ISO codes through time, e.g. here the bib entry lists [nep]:
Yes, that's quite likely. I use Glottolog (not ISO 639-3) for my language references and have to update them about every six months. I'm not a sophisticated programmer so I just download the .csv language file from the last version, the .csv language file from the most recent version and write a little formula in Excel to compare them, then make the changes in my database by hand. SIL works on a different time table with ISO 639-3 so it's probably not particularly easy to coordinate the two. A few years ago I found that SIL was giving local political decisions greater weight than scientific evidence in their choices of what to accept and what to reject so I stopped using ISO 639-3 completely. Saves the trouble for me as an individual researcher
I think, to avoid a moving-target scenario, whenever links to Glottolog are computed from ISO 639-3 codes it is important to "compile" a dataset - such as PHOIBLE - against an explicitly stated released version of Glottolog. Obviously, updating this target version to the latest release helps with feedback - such as here.
To stick with PHOIBLE's conservative idea of staying "true to the source", the simplest automated process would be to keep the ISO codes assigned in the sources and translate to Glottocodes with each Glottolog release. But arguably, ISO codes in the sources could be interpreted as "what was understood as ISO xyz at the time" - and if this interpretation is adopted, one could also just switch to the Glottocode matching the ISO code at this point in time - and then just stick with Glottolog.
To stick with PHOIBLE's conservative idea of staying "true to the source", the simplest automated process would be to keep the ISO codes assigned in the sources and translate to Glottocodes with each Glottolog release. But arguably, ISO codes in the sources could be interpreted as "what was understood as ISO xyz at the time" - and if this interpretation is adopted, one could also just switch to the Glottocode matching the ISO code at this point in time - and then just stick with Glottolog.
I'm not comfortable with freezing ISO codes, for two reasons.
people will be constantly pointing these out as "mistakes", and/or will need to to a bunch of extra work to hand-correct the ISO/glottocodes for their particular use case, and/or might choose not to use PHOIBLE at all because of how annoying this all is to them.
we've already updated many dozens of ISO/glottocodes over the years; we could probably reconstruct / revert those changes, but it would be a huge headache. Moreover, because we use secondary and tertiary sources, some shifting of ISO codes may have already happened between what the language was thought to be in the original language description, and what it was thought to be in the compiled source (UPSID, SAPHON, etc). We weren't careful about looking for / documenting such cases, we just went with the ISO codes that the compilers used. I think the closest we could come to applying "faithfulness" to the ISO codes is to fix them at the time the resource is ingested into PHOIBLE.
To me the most consistent approach, then, is to try to be faithful to the phonological content of the original language descriptions, but to try to keep the classificatory information in line with the most current understanding (i.e., keep up with ISO/glottocode changes).
I think, to avoid a moving-target scenario, whenever links to Glottolog are computed from ISO 639-3 codes it is important to "compile" a dataset - such as PHOIBLE - against an explicitly stated released version of Glottolog.
I agree. Since we're addressing these issues by changing things in phoible/dev
rather than here, and pushing new phoible.org releases periodically (not continuously), isn't that what we're already doing? Or is your point that we should be more verbose / explicit about which glottolog version is being used?
@drammock I think what you describe is perfect from the "end-"users point of view, since "faithful to the phonological content of the original language descriptions" is what they typically expect/hope for.
And yes, the released PHOIBLE data is explicit (but the web app could maybe state this more prominently). Making the baseline Glottolog version for ongoing work (i.e. the next PHOIBLE release) explicit would only be necessary, if it wasn't "current" Glottolog.
Nepali (UPSID 488) currently links to Eastern Pahari (east1436, iso 639-3 nep) in Glottolog. It should link to Nepali (nepa1254, iso 639-3 npi). UPSID calls their inventory "Nepali" as well