digling / burmish

LingPy plugin for handling a specific dataset
GNU General Public License v2.0
1 stars 1 forks source link

Updating directly into Github rather than through spreadsheets #98

Closed nh36 closed 7 years ago

nh36 commented 7 years ago

An important update on what you suggest. Sorry for rushing: The Readmes are mainly for human-readable but nicely ordered information, also as a test case for us to regularize our accounts and descriptions of the data. The more we try to regularize, the more we can come up with ways to remove redundancy.

Here's the general way how we proceed: We have the master file for languages,

You can edit it with spreadsheet editors (e.g., copy-pasting into one, then copying again, pasting into the github, while editing).

If you encounter a new variety, just as the dialect varieties of Achang, where I insist on listing them separately, ideally, you update the master file to store the information there.

In addition, we have a languages.csv file in each dataset-folder, next to the readme, where the additional information, like sources, etc. should be stored in separate columns.

For Mann1998, this is this file:

* https://github.com/digling/burmish/blob/master/datasets/Mann1998/languages.csv

Note that "ID" is here the label used for the language in the source, whiel CDDB is our BED-ID (sorry for confusion, but this is built from code I used elsewhere, which is why we need to keep CDDB instead of the more proper label BED for the moment).

By adding another column, you could likewise insert the source information there. This is a csv-file, so use comma as separator, and don't use comma in any of the field other than separator.

For the time being, we can also use the readme for this, and I'll then adjust the languages.csv files accordingly. But what I deem important is that the Readme is not supposed to carry all this weight, so when adding it, I'll later move it to the languages.csv, and to the varieties.tsv, so it'll be quicker if this is done right away, and I think you are versed enough now with spreadsheet beyond openoffice, to also add the information there (it's just copy-pasting).

This is probably sounding a bit un-straightforward, but I am currently trying to balance well how much of the more digital tasks I can show you to do yourself, and I think we're advanced enough regarding formats now, that you can use github to edit a tsv or a csv file directly.

The key is to remove redundancy, and to keep the README files as an "Experimentierfeld" for stuff we cannot regularize right away, but which we may detect to be regularizable (think of a classification of sources into "wordlist", "morpheme list", "morpheme list with proto-forms", "cognate set list", etc.: we don't have it, but we could find it useful to classify the datasets in this way, the more we learn.

LinguList commented 7 years ago

guess we can ignore this, right?