Requirements for licenses.csv

peterdesmet commented 11 years ago

The output from unique_licenses.py should not be written to a Markdown table, since [ ] is only rendered as checkboxes for comments, issues and pull requests, not for files. This makes sense, as the latter should only be edited with a commit. This will put a higher threshold for collaborative annotation, but not much.

I would generate the output from unique_licenses.py as a csv, with the following columns, in this order and with these headers:

standard license
use
distribution
derivatives
commercial
attribution
share alike
notification
license

~~I will think about what attributes we need and update this issue.~~
~~The column number of datasets should contain a calculated integer indicating the number of times this license occurred.~~
~~Ideally, the csv is ordered on the number of datasets column, with more occurring licenses first (= descending order).~~
The values for the attr columns can be left empty. They will be annotated with true, false, ?.
The license column is placed last to make it easier to fill out the attributes column.

peterdesmet commented 11 years ago

I have updated the column names. I think we need 8 yes/no/? attributes. I also renamed datasets to number of datasets.

peterdesmet commented 11 years ago

We should not forget to update README before closing issue.

peterdesmet commented 11 years ago

I updated order and one header name. I already started annotating at https://github.com/Datafable/gbif-data-licenses/blob/test-annotation/data/licenses.csv Only 395 of 427 to go. :muscle:

peterdesmet commented 11 years ago

I removed the number of datasets column, as this file should only contain data and not calculated data. Another reason is that if a new dataset uses an existing license, this should not affect this file.

peterdesmet commented 11 years ago

@bartaelterman let me know if it is easier if the license column is the first one instead of the last one, e.g. for automatically checking if a license is already in the file. The only reason why it is useful to have it at the end, is that it reads somewhat easier: https://github.com/Datafable/gbif-data-licenses/blob/test-annotation/data/licenses.csv

peterdesmet commented 11 years ago

302 of 427 licenses (71%) are now annotated. :jack_o_lantern:

bartaelterman commented 11 years ago

Updated licenses.csv.

This required quite some hacking to paste the dataset-count with the other, already manually annotated fields. To do so, I needed to sort the licenses. This sorted order will not necessarily stay after rerunning the script.

peterdesmet commented 11 years ago

Regarding datasetCount: just include one commit that includes that number (I'm just interested), and than revert it.

peterdesmet commented 11 years ago

Right, I see it's already committed (abbb1b14a3fc564c2513d066cf9b50fc2b24df5d), so it can be reverted. :arrow_backward:

bartaelterman commented 11 years ago

Yes, but personally, I woudn't revert it. It's very interesting information. If you don't want it to occur in the licenses.csv file, why not store it in a separate file?

Datafable / gbif-data-licenses

Requirements for licenses.csv #2