captainsafia / legit

Add licenses to projects at the command line
MIT License
566 stars 19 forks source link

SPDX backend #14

Open maoo opened 7 years ago

maoo commented 7 years ago

I like this project and I think it would be really helpful for our Software Foundation; since we are increasingly adopting SPDX, I though it would have been cool to add a backend for it, which replaces the local licenses folder and also validates the license Identifier passed by the user.

I've dropped some code on https://github.com/maoo/legit/tree/spdx-backend ; although it's not final, it runs locally without blowing up, hopefully (my Node skills are very humble); the README file explains how to use and configure it.

This is the way it works:

  1. A user runs the script passing a license with the -l option, as before; the only difference is that now it must be a valid SPDX Identifier, otherwise it will fail
  2. legit validates the SPDX Identifier against SPDX using spdx-licenses npm
  3. legit downloads and parses the license text from https://spdx.org/licenses/<Identifier>.html
  4. If a placeholder configuration is available for that license, legit will try to resolve those values from command-line options and replace them in the license text

Placeholder definitions are hosted on github and can be extended by the community.

The implementation is not complete, there are some known issues that I've also reported in the README

I'm eager to know what others think about SPDX and this implementation; if you like the idea, I'd be happy to work on it further and send a Pull Request.

Thanks for sharing this work in the open!

pmonks commented 7 years ago

As a head's up, @maoo and I are trying to get organised and merge our respective forks so we can submit them as a PR. Apologies for not doing that from the get go - we're in different timezones and don't have a lot of overlap in which to coordinate our efforts. :wink:

jacobmischka commented 7 years ago

This is pretty much what I did in my sort-of-fork https://github.com/jacobmischka/papers, though it allows any of the name, spdx id, or nickname of a license (as listed in github/choosealicense.com, using a quick json I made of the licenses.

Edit: That sounds like an advertisement, which I didn't really mean it to be. I just mean you can take any bits you want or use that JSON file I threw together. I only created it because I wanted something to use myself that reads from package.json which is out of scope for this package.

maoo commented 7 years ago

Thanks @jacobmischka ! I'll definitely checkout your implementation; it would be great to include some of your code's feature in a Pull Request against this repo.

I'm eager to know what you (and others) think of PR #15, which combines my initial implementation with some additions from @pmonks (thank you!)

captainsafia commented 7 years ago

I was planning on implementing some of this over the weekend by leveraging GitHub's License API. The API is in dev preview mode right now but I think it does a good job of providing

I also like the fact that it provides the body of the license within the JSON payload so we don't have to worry about fetching from a URL.

Does SPDX have a similar JSON API that can be used to fetch licenses?

maoo commented 7 years ago

Hi @captainsafia , I see that GitHub's License API uses spdx_id in the payload, so we'd still support SPDX; I like the idea of adopting it as main backend (definitely better than spdx.org, which does not provide API and is not suited for that), let me know if/how I can help.

We'd probably still want to have a mechanism to replace tokens in the license body, such as [year] or [owner]; what do you think of the approach we've taken with license-placeholders.yml ?

pmonks commented 7 years ago

My gut tells me that going direct to the source (i.e. SPDX) is the right way to go, for a couple of reasons:

It's also a possible turn-off for folks who want to use legit but don't use GitHub - going directly to SPDX (a Linux Foundation Collaboration Project) may carry more weight vs appealing to GitHub's lesser authority on the topic of open source licensing (after all, open source licensing is at the centre of what SPDX do, but it's peripheral to GitHub).

wdyt?

jacobmischka commented 7 years ago

I think without a dedicated API endpoint I don't really like the idea of fetching from a random URL every time. If one were to go forward with the official spdx source then I think the should all be downloaded as a dependency at install time.

It's also worth noting that github includes licenses in addition to the official spdx list, such as WTFPL.

pmonks commented 7 years ago

@jacobmischka it's really not a "random URL" though - the SPDX project has a well-organised, comprehensive set of licenses in their license-list repository, and that's where our PR pulls the license texts from.

EDIT: the idea of pulling down the licenses at build time and "baking" them into the downloaded legit package is interesting, but it does create an avoidable coupling between SPDX releases and releases of legit. Pulling them at runtime (as our PR does) feels more scalable to me.

jacobmischka commented 7 years ago

Oh last I checked I thought it was just scraping their site. In any event that makes it even easier to fork that repo and add a package.json and publish it on npm instead of doing an HTTP get every time someone wants to copy a text file. Being usable offline would be a big advantage imo.

jacobmischka commented 7 years ago

Legit is not a binary. Using dependencies from npm is extremely common.

pmonks commented 7 years ago

s/binary/package - my point remains.

Regarding offline usage - that was one of my thoughts too, but the reality is that I'm offline rarely enough that that wouldn't be a showstopper for me. That would be a worthwhile enhancement though, imho.

I should also point out that both the spdx-licenses and spdx-license-list npm modules our PR introduces have exactly the problem I mention above - they're both out of date with the latest SPDX release, in large part because they replicate the SPDX data instead of looking it up.

Interestingly, I just discovered that the SPDX project has published recommendations on how to programmatically access the SPDX license list, and by chance our PR mostly adheres to those recommendations.

jacobmischka commented 7 years ago

Ah, that makes fetching from the URLs a lot more appealing but I still think forking their repository and publishing it on npm to use as a dependency would be better.

I don't think doing that is making it any more coupled than fetching it from their website is. Semantic versioning means that updates to the dependency don't rely on anyone updating legit, it's more aligned with the javascript ecosystem, and it makes legit no longer depend on an active internet connection, albeit at the cost of someone needing to maintain that npm package.

captainsafia commented 7 years ago

My big goal was this was for it to be network-independent. Instead of loading the license every time on on command, it would be loaded on post-install inside package.json. Although most people have Internet connectivity, they won't notice if the command is using a locally stored version of the licenses or fetching from the network. Those that don't will so I think it's best to build for them. I'm OK with making a new release. It seems like new releases of the list don't happen that frequently. We can also always have a legit update command if need be.

Is there a JSON-based API for SPDX (as opposed to RDF)?

pmonks commented 7 years ago

The SPDX license list is available in a variety of data formats, including JSON. The "API" (such as it is) is simply an HTTP GET of this resource.

From that resource you can then HTTP GET (by substituting the SPDX License Identifier into this URL) each of the individual license text files.

pombredanne commented 7 years ago

@pmonks I was about to reply you on the SPDX ML exactly that.

pombredanne commented 7 years ago

You could also use this https://github.com/spdx/license-list-data repo of as part of your npm build with a clone and/or a .gitmodule to avoid any fetching/network dep at run time.