Closed FredericBlum closed 2 years ago
I think all we need is the Glottocode and the download link(s) plus license. The other information should be read from the metadata files in the downloads (and the license info therein should be checked against the one we keep in languages.tsv
).
The following information is now provided within a single metadata file in Version 1.1:
Language | Glottocode | iso-639-3 | Family | fam_glottocode | Area | Creator | Latitude | Longitude | Archive | Archive_link | Translation | Annotation license | Audio license | DOI | Gloss | Extended speakers | Extended word tokens | Extended texts | Core speakers | core word tokens | Core texts | Years of recordings in core set
We could either go for a reduced set (Name, Glottocode, License), or for the full information. This would mean a lot of custom columns, but I think it would be reasonable to include this data. Also, this means that we would no longer require reading in the information from the individual metadata files.
Yes, adding all this info seems reasonable. Potentially, some of it would go into a ContributionTable, though.
Frederic Blum @.***> schrieb am Do., 25. Aug. 2022, 08:49:
The following information is now provided within a single metadata file in Version 1.1:
Language | Glottocode | iso-639-3 | Family | fam_glottocode | Area | Creator | Latitude | Longitude | Archive | Archive_link | Translation | Annotation license | Audio license | DOI | Gloss | Extended speakers | Extended word tokens | Extended texts | Core speakers | core word tokens | Core texts | Years of recordings in core set
We could either go for a reduced set (Name, Glottocode, License), or for the full information. This would mean a lot of custom columns, but I think it would be reasonable to include this data. Also, this means that we would no longer require reading in the information from the individual metadata files.
— Reply to this email directly, view it on GitHub https://github.com/cldf-datasets/doreco/issues/2#issuecomment-1226848125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKHP65QVIOZR2CYS3ELV24JRJANCNFSM55KDSK6Q . You are receiving this because you commented.Message ID: @.***>
I added the Languages with success, same with the ContributionTable. What I am currently failing at, however, is adding a new MetadataTable
. I added the component, but I fail to create the necessary metadata-json. Could you point me at an example from some other repository or documentation where I can find this? I've looked in several and couldn't identify what is missing.
Code: https://github.com/cldf-datasets/doreco/blob/main/cldfbench_doreco.py#L62-L288
MetadataTable
is no CLDF component, i.e. this type of data isn't standardized in CLDF. So you'd just add another custom table via
cldf.add_table('metadata.csv', **columns)
and populate it via
args.writer.objects['metadata.csv'].append(dict)
Thank you, so the main problem was that I had `objects['MetadataTable'] instead of the CSV-file. Now everything works fine.
Then we can probably close this issue as well
What information can or should we put in the
languages.tsv
file?From the DoReCo mainpage, we have the following options:
I would like to add at least the citation key for the individual corpora and the information about glossing. This could make it easier to filter for specific studies etc., and the citation key assures (hopefully) that people who use the corpus cite the individual corpus creators.