cldf-datasets / doreco

CLDF dataset derived from DoReCo's core corpus
https://doreco.info/
3 stars 0 forks source link

Add raw data without upload to Github #1

Closed FredericBlum closed 2 years ago

FredericBlum commented 2 years ago

I will add a folder within raw that includes the data and a script that parses the relevant csv-files from the subfolders. The raw data will not be uploaded to Github so that we don't bloat the repository (as discussed with xrotwang).

This will only be done for the languages that do not have a ND-tag.

xrotwang commented 2 years ago

ND clause for audio only is fine, though.

FredericBlum commented 2 years ago

I added a script to parse all zip-files in the directory. I create raw-files for the phone, and the word-corpus, as well as another file with the glossing conventions, and a file with all the speaker/file-metadata that I parse as discussed in #5

FredericBlum commented 2 years ago

So I think we can close this issue, or is there anything else that we will not do through #4 ?