Charcoal-SE / SmokeDetector

Headless chatbot that detects spam and posts links to it in chatrooms for quick deletion.
https://metasmoke.erwaysoftware.com
Apache License 2.0
474 stars 182 forks source link

Single unified License for works derived from Smoke Detector Data. #720

Closed magisch closed 7 years ago

magisch commented 7 years ago

This came up starting here: http://chat.stackexchange.com/transcript/message/37245230#37245230 And then here again: http://chat.stackexchange.com/transcript/message/37248760#37248760

Basically my proposal is to make a commitment similar to what we did with repository contributions that all our works utilizing smoke detector's collected data (like @AWegnerGitHub 's fancy graphs or analytics models we make based on that data) be licensed under a single license similar to Smokey itself, in order to facilitate and enable easy reuse of those works by others.

One of the questions regarding this would be which license to use, and another of what scope of work it'll cover.

Undo1 commented 7 years ago

So you're proposing a 'cancer license', like GNU? Realistically, I tend to oppose things that force folks to go public and allow modification of what they've done. For example, I have work with Smokey data that's been merged with data from moderators and staff. I can't make that public.

Same with the MS database - I can't just copyleft the whole thing.

In the end, it's the normal question: Is this a real issue?

If it is, I'd probably recommend going with a CC0-type license. If there's one out there that allows use, but doesn't force a license upon others, I'd probably be happy. Maybe a CC0 with a "derivatives default to a permissive license, but you can specify otherwise if you want".

ArtOfCode- commented 7 years ago

Limitations:

@angussidney pointed out the ODC PDDL or BY licenses. I'd be happy with the BY license - it's a license designed for datasets/databases, and it requires some form of attribution without being copyleft like the GPL or BY-SA.

AWegnerGitHub commented 7 years ago

I need a bit more information before forming an opinion.

ArtOfCode- commented 7 years ago

@AWegnerGitHub Responses generally depend on which license we choose; for the purposes of this I'm going to assume ODC BY.

magisch commented 7 years ago

@AWegnerGitHub The end goal would be to have every derivative work we release (so you choose what you post) on GH or somewhere else available under a single license so people can reuse it.

AWegnerGitHub commented 7 years ago

@magisch That seems unnecessarily burdensome.

As an example, if I perform some kind of analysis and post it, this now requires that I abide by the terms of SE (because I posted it to the chatroom) and whatever terms we come up with here. Why the need for two separate licenses? I'm already bound by the CC By SA license once it's posted over the SE.

What are you expecting people to "reuse"? An image? Great...CC By SA, because it's already under that license. An explanation of the image? Great...already under CC by SA. Now, we add one of the licenses mentioned by @ArtOfCode- (which I don't have a link to...) and we have added more that someone using the data needs to do:


If we licensed under ODC BY, technically you'd have to add an attribution note ("graph generated using data from Charcoal"), but I doubt any of us would hold you to that.

You better. That's the point of the license. If you don't, it's worthless. I don't like that we are already saying we aren't going to hold people accountable for a license.

Our software is already available, but our data is only available in limited form through metasmoke or data dumps (where the licensing status is ambiguous, so using them for any major project without confirming with us first would be a bad idea).

The API documentation is hosted in the MetaSmoke repository, a repository with a CC0 license in it, which strongly implies the data retreated via that API is licensed the same way. Smokey has two licenses (Apache and MIT). Adding a 4th license into this mix isn't going to make anything less ambiguous. Especially with activities going on in other repositories and possible backports from NG to Smokey. By taking certain things that are current MIT/Apache licensed out of the code (blacklists, regular expressions) and putting those in the database we've only made things more confusing. Are those no longer code? Are they covered by the Apache/MIT scheme or by the new data scheme? Or, are they under one scheme until a certain date and then changes after that date change the licensing?


Either I'm not understanding the need for this and potential confusion it's going to cause, or it's adding restrictions that really don't seem to be needed (or both? neither?), but the way it's described above by @magisch and @ArtOfCode- does not make this seem worth while.


angussidney commented 7 years ago

Let's just set something straight here: what we're talking about licensing is anything in the MS DB that isn't already licensed under CC BY-SA because it came from SE. For example: feedback, reasons for which a post was caught, any of the graphs, status/scan rate information from Smokey, autoflagging logs, etc.

What we're not giving a license to derivative works - e.g. Andy's graphs that he made up to illustrate autoflagging - that's up to the owner to decide. We're just giving a license to the raw data itself.

Or at least that's what I'm thinking/talking about. Maybe I just had it wrong.


Anyway, my thoughts:

magisch commented 7 years ago

@angussidney I was thinking anything published on our official channels that doesn't have a license yet.

ArtOfCode- commented 7 years ago

We need to provide a license for this 'other' content, or otherwise it would be legally ambiguous what license would apply if anyone wants to use our data in their project.

This is the core of this issue. Our data is not licensed - that includes data retrieved via the metasmoke API. There's an implicit right-to-view there, which includes personal use, so nothing we're doing with it right now is an issue, but if someone wanted to come along and use our API to build on our data, there's no license that they can use it under.

The API documentation is hosted in the MetaSmoke repository, a repository with a CC0 license in it, which strongly implies the data retreated [sic] via that API is licensed the same way.

The documentation is under CC0. Probably. The data isn't, because it's not connected to the repository that is under CC0, and the license doesn't extend to it.

Why we should use yet another license - You know how people complain about SE, particularly SO, using CC BY-SA for code because it's not a code license? In a similar vein, using a code or CC license for a dataset/database is equally bad. If we're licensing this stuff, we should be licensing it using a data license.

AWegnerGitHub commented 7 years ago

I was thinking anything published on our official channels that doesn't have a license yet.

This contradicts your previous explanation:

The end goal would be to have every derivative work we release (so you choose what you post) on GH or somewhere else available under a single license so people can reuse it.

(emphasis is mine)


The documentation is under CC0. Probably.

This type of statement does not build confidence that adding a 4th license into our solution will resolve anything at all.

In a similar vein, using a code or CC license for a dataset/database is equally bad. If we're licensing this stuff, we should be licensing it using a data license.

Explain why. Not all of us know our licenses inside and out. What does this data license that has been mentioned repeatedly get us over a "bad" CC license?

magisch commented 7 years ago

This contradicts your previous explanation: @AWegnerGitHub Maybe I didn't make myself clear enough. I was saying the same thing I did afterwards, but more convolutedly. I was talking about data and stuff like graphs using that data that we post on our website or github, or elsewhere in an official capacity as charcoal. Basicly what I thought would be nice would be that someone can see something charcoal made and can go "Cool, I can reuse that" without having to piecemeal investigate licensing issues.

One of your graphs would fall under that for instance (pending your approval of course, nobody can license what isn't theirs) if it was posted somewhere like charcoal-se.org

ArtOfCode- commented 7 years ago

Explain why. Not all of us know our licenses inside and out. What does this data license that has been mentioned repeatedly get us over a "bad" CC license?

I haven't actually looked at either license in huge amounts of detail, but... would you use CC BY-NC-ND for code? Would you use the GPL for a scientific paper? Would you use a fork to eat soup?

The documentation is under CC0. Probably.

I said that because GitHub don't make it entirely clear how repository licenses extend to wikis, but I assume they do.

Undo1 commented 7 years ago

Closing for now; comments still welcome.