glottolog / glottolog-cldf

Glottolog data as CLDF StructureDataset
https://glottolog.org
Creative Commons Attribution 4.0 International
13 stars 3 forks source link

MED granularity? #3

Closed HedvigS closed 4 years ago

HedvigS commented 4 years ago

I finally updated my R scripts that cobble together a table from the glottolog data in a format that I prefer (here). As I was shifting from a langouds.csv and treedb approach to using almost only the glottolog cldf I noticed a few smaller changes. It all went fine, I think i understood most of the decisions.

I was just curious, will the cldf release ever contain more fine-grained categories for "med" below wordlist?

xrotwang commented 4 years ago

"wordlist" is the "lowest" type in the document types recognized by Glottolog. So providing more granularity "below" would only be possible, if the whole database would be annotated with new document types. I don't think this going to happen.

Why would you need this and which types would you suggest?

HedvigS commented 4 years ago

Right, okay. Will med ever include any more granularity, or is there another way of getting the old "doctype" from the cldf release?

I just found it a bit confusing to only work with these 5 med classes instead of the previous 16 doctypes. Is "specific_feature" lumped into med class "phonology/text" or "wordlist or less"? I take it "phonology/text" then if wordlist is the lowest?

HedvigS commented 4 years ago

Ah, right, Just found "From highest to lowest, the ranking is grammar, grammar sketch, dictionary/phonology/specific feature/text, wordlist, followed by the remaining document types.".

That means "phonology/text" includes "specific_feature". Right, okay. That makes the med scale a bit more clear. It makes me tempted to rename "phonology/text" into "dictionary/phonology/specific feature/text" and try and split "wordlist or less" into "wordlist" and the others. It also means that "Ethnographic Work" is under wordlist, correct?

xrotwang commented 4 years ago

Here's the code that computes med_type: https://github.com/glottolog/pyglottolog/blob/master/src/pyglottolog/references/bibfiles.py#L267-L278 and here's the doctypes: https://github.com/glottolog/glottolog/blob/master/config/document_types.ini with ordering defined by rank.

xrotwang commented 4 years ago

So yes, ethnographic work is ranked 3, below wordlist, ranked 10.

xrotwang commented 4 years ago

Getting the document type(s) for a MED from the CLDF release would work as follows:

  1. You determine the BibTeX key of the MED, given in the Source column of the relevant row in values.csv, e.g. hh:s:Crevels:Canichana for cani1243, from
    cani1243-med,cani1243,med,grammar sketch,med-grammar_sketch,,hh:s:Crevels:Canichana,
  2. You lookup the hhtype field of this reference in sources.bib:
    @incollection{hh:s:Crevels:Canichana,
       author = {Crevels, Mily},
       ...
       hhtype = {grammar_sketch},
       ...
       year = {2012}
    }

Note that a reference might be tagged with more than one document type.

HedvigS commented 4 years ago

Okay, so specifically "wordlist" is not the lowest type and the more granular hhtype is retrievable. Thanks!

xrotwang commented 4 years ago

Ah, sorry, I was wrong saying "wordlist is the lowest type". Will make sure the order at https://glottolog.org/meta/glossary#Doctype is by rank and not alphabetical. See https://github.com/clld/glottolog3/issues/125

HedvigS commented 4 years ago

No worries, thanks :)!