Open sdruskat opened 3 years ago
Do we have / can we get an overview of which proportion of CITATION.cff
files use type: dataset
?
Do we have / can we get an overview of which proportion of
CITATION.cff
files usetype: dataset
?
Yes, I can run an analysis over the corpus I'm harvesting. Will take a bit of time.
Didn't take that long:
In the data for 15977 different repositories, I found
CITATION.cff
file included a line that starts with the string type:
and has the string dataset
in the same line for any version of the file I have on record.CITATION.cff
file included a line that starts with the string type: dataset
for any version of the file I have on record.Other files have no type information, or type information different than dataset
, or are not UTF-8 decodable.
So given this simple metric, the ratio of repositories with a dataset
-type CFF file at any given time is around .02
.
Alejandra Gonzalez-Beltran has got in touch with me to discuss the experimental dataset support and its relation with other (standard) formats and initiatives (W3C DCAT vocabulary, DataCite, RO-crate).
We should initiate this discussion and perhaps bring Arfon into it as well.