Closed peterdesmet closed 11 years ago
I have updated the column names. I think we need 8 yes/no/? attributes. I also renamed datasets
to number of datasets
.
We should not forget to update README before closing issue.
I updated order and one header name. I already started annotating at https://github.com/Datafable/gbif-data-licenses/blob/test-annotation/data/licenses.csv Only 395 of 427 to go. :muscle:
I removed the number of datasets
column, as this file should only contain data and not calculated data. Another reason is that if a new dataset uses an existing license, this should not affect this file.
@bartaelterman let me know if it is easier if the license
column is the first one instead of the last one, e.g. for automatically checking if a license is already in the file. The only reason why it is useful to have it at the end, is that it reads somewhat easier: https://github.com/Datafable/gbif-data-licenses/blob/test-annotation/data/licenses.csv
302 of 427 licenses (71%) are now annotated. :jack_o_lantern:
Updated licenses.csv
.
This required quite some hacking to paste the dataset-count with the other, already manually annotated fields. To do so, I needed to sort the licenses. This sorted order will not necessarily stay after rerunning the script.
Regarding datasetCount
: just include one commit that includes that number (I'm just interested), and than revert it.
Right, I see it's already committed (abbb1b14a3fc564c2513d066cf9b50fc2b24df5d), so it can be reverted. :arrow_backward:
Yes, but personally, I woudn't revert it. It's very interesting information. If you don't want it to occur in the licenses.csv
file, why not store it in a separate file?
The output from
unique_licenses.py
should not be written to a Markdown table, since[ ]
is only rendered as checkboxes for comments, issues and pull requests, not for files. This makes sense, as the latter should only be edited with a commit. This will put a higher threshold for collaborative annotation, but not much.I would generate the output from
unique_licenses.py
as a csv, with the following columns, in this order and with these headers:I will think about what attributes we need and update this issue.The columnnumber of datasets
should contain a calculated integer indicating the number of times this license occurred.Ideally, the csv is ordered on thenumber of datasets
column, with more occurring licenses first (= descending order).attr
columns can be left empty. They will be annotated withtrue
,false
,?
.