aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.09k stars 541 forks source link

Check for possible new data/geo licenses #3691

Open pombredanne opened 7 months ago

pombredanne commented 7 months ago

from @pabs3 See https://whosonfirst.org/docs/licenses/

DennisClark commented 7 months ago

a very odd "license" -- the license text is basically just this:

"Crediting Who's On First is recommended and linking back to the License is required."

and the "linking back" is obviously critical, since the list of contributors could obviously change (although it does not appear to have been updated since 2020-02-21)

further analysis in progress

DennisClark commented 7 months ago

@AyanSinhaMahapatra please create the following license and associated rules:

key: whosonfirst-license name: whosonfirst license short-name: whosonfirst license owner: whosonfirst homepage: https://whosonfirst.org/docs/licenses/ category: Proprietary Free spdx: LicenseRef-scancode-whosonfirst-license

notes: Attribution must reference the link to the license page which contains a list of the contributors and their individual licenses, which may vary greatly in obligations and restrictions: https://whosonfirst.org/docs/licenses/ The Who's On First dataset is both an original work and a modification of existing open data (sources listed below). Some of those open data projects do require attribution. We detail all sources and their specific license, usage, and vintage metadata in the full sources list.

text: Crediting Who's On First is recommended and linking back to the License is required.

pabs3 commented 7 months ago

This issue isn't about the license at the top of the page, but about the many links to other licenses throughout the page.

-- bye, pabs

https://bonedaddy.net/pabs3/

pombredanne commented 7 months ago

@pabs3 re:

This issue isn't about the license at the top of the page, but about the many links to other licenses throughout the page

Yes, and there are a lot of these, so I think Dennis is just starting from the top!

pombredanne commented 7 months ago

This yields all these URLs:

for a in $(curl https://whosonfirst.org/docs/licenses/ | tidy -q -i -w 300 | egrep -oh -e "and <a.*href=\"([^\"]+).*>license</a>" | egrep -o "http[^\"]+" | sort -u); do echo - [ ] $a; done

DennisClark commented 7 months ago

@pombredanne and @pabs3 Thanks for the comments and the list of URLs. This will take a little while. Work-in-progress.