final check before we submit to zenodo

cldf-datasets / tangclassifiers

CLDF dataset derived from Tang and Her's "Quantitative typological data on classifiers and plural markers" from 2019

Creative Commons Attribution 4.0 International

0 stars 0 forks source link

final check before we submit to zenodo #6

Closed LinguList closed 4 years ago

LinguList commented 4 years ago

@marctang, this is almost ready now, I added a file .zenodo.json which will organize how this will be rendered on zenodo, but one can later also modify it. I suggest, if you tell me you're ready, we'll make the release and this will then be automatically uploaded to zenodo, so you have your DOI for the code, and can also link that e.g, on your website, etc.

LinguList commented 4 years ago

BTW: you had duplicate sources in your bibtex, I had to remove them.

LinguList commented 4 years ago

Checking this with Python is easy (now you have all the software installed):

>>> from pycldf.sources import Sources
>>> Sources.from_file(yourbibtex)

marctang commented 4 years ago

Blimey, sorry I forgot to check the duplicates in bibtex when I exported then, thanks for pointing it out!

I have a quick question for the references: not all of them were annotated with wals_code in bibtex, which makes (if I read the code correctly) the matching primarily based on wals_code miss some items in values.csv (e.g., mous_alagwa_2016), is it possible to use the match from GSSG_ListOfLanguages.csv ? or does the wals code has to be added in bibtex by default? Thanks!

LinguList commented 4 years ago

Ah, I did not know: do you have another key linking your data to the bibtex? If so, we can just replace it (you can just test this in the code). If there's no key to link sources and langauges, I'd not know how to proceed...

LinguList commented 4 years ago

Okay, sorry, I am stupid: you HAVE the source in yoru data. Of course, then we use the Source link to the sources.bib, and that's all. Do you want to do that yourself, @marctang? If there are problems, I'll help.

marctang commented 4 years ago

No pro, I added the Wals_code in bibtex now. However, a quick question about something that I might be doing wrong: some references are affiliated to more than one wals code (e.g., Kinkade-2001), and these references don't show up in values.csv @@. Maybe I am using a wrong format for the links? for now I use something like this:

@Unpublished{Kinkade-2001, Wals_code = {eya; klp; klm; mll; tli}, Title = {{T}he {A}real {Q}uestion: {N}orthwest {C}oast and {C}alifornia}, Author = {Kinkade, M. Dale}, Year = {2001}, Type = {paper} }

xrotwang commented 4 years ago

addressed here https://github.com/cldf-datasets/tangclassifiers/commit/dd8433a4fd5ea371ee7987fd6e1d27236ecbbddc

LinguList commented 4 years ago

@marctang, please check if this is fixed now.

LinguList commented 4 years ago

perfect, thanks @xrotwang.

xrotwang commented 4 years ago

@marctang the "missing source" issue seems fixed (verified with the csvstat command from csvkit):

$ csvstat -c Source cldf/values.csv 
  7. "Source"

    Type of data:          Text
    Contains null values:  False
    Unique values:         372
    Longest value:         31 characters
    Most common values:    Omar-1983 (16x)
                           Kinkade-2001 (10x)
                           Lynch-1998 (8x)
                           Suarez-1983b (8x)
                           Welmers-1973 (8x)

Row count: 800

marctang commented 4 years ago

Thanks to both for your help! I also did a re-run locally and the code works properly, same for csvstat! Awesome :-)! The output is correct and ready to be released on Zenodo!