ColinFay / proustr

Tools for Natural Language Processing in French and texts from Marcel Proust's collection "A La Recherche Du Temps Perdu"
http://proustr.colinfay.me/
Other
24 stars 2 forks source link

license of feel lexicon #10

Closed jwijffels closed 5 years ago

jwijffels commented 6 years ago

Hello,

I had a look at the sentiment lexicons in the package which seem to be built based on the FEEL dataset. Can you explain the license of that lexicon, FEEL seems to be derived from the NRC Word-Emotion Association Lexicon. What is the implication of this on the license. Your package mentions MIT license. Is this correct?

jwijffels commented 5 years ago

Any feedback on this? Or shall I disregard this R package as the license seems incorrect?

ColinFay commented 5 years ago

Hi @jwijffels,

Sorry, I didn't answer sooner, I meant to write an answer and that slipped my mind.

As far as I remember the FEEL lexicon does not have a license linked on its webpage. http://www.lirmm.fr/~abdaoui/FEEL

The only thing written down is a "how to cite", which is done here : https://github.com/ColinFay/proustr/blob/master/R/proust_sentiments.R#L7

I'm open for discussion /changes if you think the licensing of {proustr} is not correct. To be 100% honest I'm not totally clear about how the original dataset impacts the licensing of the package. As far as I understand, the pure code differs from the data. If you plan on using the data, you need to follow the licence of the data, but it can be different from the source code (just like, "you can use a peer-to-peer software, but you can't download a movie with it").

Feel free to drop any comment you might have on this, this would definitely help. And I'll change the licence accordingly if it happens that MIT was a wrong choice.

jwijffels commented 5 years ago

My question is only about the file sentiments_polarity.rda, not about any of the code inside this R package.

I'm trying to find out if this lexicon can be used for a commercial sentiment application. I see that the FEEL lexicon is a translation of https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

License details of that lexicon are at https://saifmohammad.com/WebPages/AccessResource.htm and at https://saifmohammad.com/WebDocs/README-NRC-Lex.txt

If I read that, this seems to imply that it can not be used for commercial purposes. Any derivations (like the french translation) seem to me also be limited by that license, I think - I'm not a lawyer neither. Hence my question on the license. Have you asked the author at http://www.lirmm.fr/~abdaoui/FEEL about this and if you could redistribute the data? I think currently the R package does not mention this license restriction whatsoever.

ColinFay commented 5 years ago

My point about the code was that I guess that code & data can have two different licenses (as far as I can tell).

But, you're totally right, I'll send an email and ask the author of the dataset. This will indeed impact the license of the dataset and/or the package.

I've put a message in the function to warn:

https://github.com/ColinFay/proustr/blob/master/R/proust_sentiments.R#L18

Thanks for reaching out about that.

jwijffels commented 5 years ago

Agree that code & data can have different license. Would be great to know 100% certainty on the license of the data.

ColinFay commented 5 years ago

Yes indeed. I guess we are all a little bit lost with the licensing.

For what it's worth, we are making a proposal to the RConsortium with Miles (https://github.com/ThinkR-open/isc-proposal-licence) on this subject, so that we could clarify all our/the community licenses questions.

jwijffels commented 5 years ago

That would be interesting indeed.

ColinFay commented 5 years ago

After giving it some thoughts I think I'll move the dataset out of {proustr} to another package with a license that exactly fits the one from the FEEL lexicon, and advice to use this other package when doing text mining (with a warning that the license for FEEL is not MIT)

jwijffels commented 5 years ago

That solution seems correct to me.

ColinFay commented 5 years ago

Here are the changes:

https://github.com/ColinFay/rfeel

https://github.com/ColinFay/proustr/blob/master/R/proust_sentiments.R#L7

jwijffels commented 5 years ago

Thank you for the update. Makes licenses more clear now.

jwijffels commented 5 years ago

@ColinFay I see the package has been updated on CRAN. It would probably be a good thing to ask cran to remove older (archived) versions of the package. That's also what cran did when they found out the license of tidytext was incompatible with the data shipped inside the package (cran removed all archived versions of tidytext)