bbqsrc / spdx-lookup-python

SPDX license list query tool
https://pypi.python.org/pypi/spdx-lookup
BSD 2-Clause "Simplified" License
7 stars 3 forks source link

spdx-lookup fails to parse most files /usr/share/doc/*/copyright #2

Open dankegel opened 5 years ago

dankegel commented 5 years ago

I would like a tool that reads license info in DEP-5 format and outputs license info in SPDX format.

This may mostly be a superficial conversion, but may also require some full-text scanning.

spdx-lookup could or should be close enough, but currently isn't.

For instance, on any debian or ubuntu system, I would expect a command like

spdx-lookup -f /usr/share/doc/zlib/copyright info

to output

Id: Zlib

This works for a few packages, but fails for about 99% of the ones I've tried.

bbqsrc commented 5 years ago

Patches are welcomed for all feature requests. :smile:

dankegel commented 5 years ago

Hmm. The word cloud matching doesn't seem like it'll do what I was hoping.

https://github.com/nexB/scancode-toolkit might have the kind of matching I was hoping for, though:

$ scancode-toolkit/scancode --quiet --json-pp - --license /usr/share/doc/zlib1g/copyright | jq "{licenses: .files[].licenses[].spdx_license_key}"
{
  "licenses": "Zlib"
}

Some assembly required, though. And you really can't get around parsing dep-5 files. scancode's pretty good for non-dep-5 files, though. https://github.com/Oblong/obs/blob/master/ob-list-licenses and https://github.com/Oblong/obs/blob/master/ob-filter-licenses are my current try at cobbling existing tools together to do roughly the right thing.