Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
697 stars 132 forks source link

Contributing translations in CC0 #1858

Open trang opened 5 years ago

trang commented 5 years ago

The CC0 license is starting to get momentum and our users are starting to feel limited for not being able to create translations in CC0, as shown in this Wall thread: https://tatoeba.org/eng/wall/show_message/31587#message_31587

We will need to:

  1. Allow users to choose the license when adding a translation to a CC0 sentence.
  2. Allow users to change the license from CC BY to CC0 not just on their original sentences, but also on their translations of CC0 sentences.
  3. Provide a way to switch all eligible translations to CC0 (either by evolving the switch_my_sentences page, or creating another similar page dedicated for translations).
jeanm commented 5 years ago

May I also suggest two small quality of life improvements to consider?

  1. To ensure that translations added via the “add translation” icon on the “contribute > add sentences” interface match the license specified at the top.
  2. More generally, to have a way for users to set CC0 as their default license, wherever possible. Or more radically, a way for users to specify “upgrade my license to cc0 wherever possible, even in the future” which is more or less what I was suggesting on the wall. I realise the latter is probably more annoying to implement.

I imagine these would significantly decrease the reasons why users would need to use the “switch all my sentences to cc0” command.

aucampia commented 4 years ago

From Circular 33 of the United States Copyright Office:

Words and short phrases, such as names, titles, and slogans, are uncopyrightable because they contain an insufficient amount of authorship.

This means that really most sentences, such as "How are you doing?", cannot be covered by CC BY 2.0 FR and labeling them as CC BY 2.0 FR is misleading and wrong. They are effectively already CC0 and labelling them as CC0 would just serve to make it clear to contributors that they won't be able to make any rights claims and to consumers of your database that they won't have to worry about rights claims.

You really should prioritize fixing this, because otherwise it is just one massive red herring that makes all your valuable data useless.

trang commented 4 years ago

Could you elaborate why you feel that the CC BY license makes our data useless? Are you facing any particular issues in reusing our data due to the CC BY license? Or have you seen other projects not being able to reuse our data because of CC BY?

aucampia commented 4 years ago

@trang Wikidata has CC0 policy for data: https://www.wikidata.org/wiki/Wikidata:Licensing#Official_policy so importing data or derived data from Tatoeba to Wikidata would be a problem.

AndiPersti commented 4 years ago

As I understand it, Wikidata is about structured data. So how would the sentences in Tatoeba be useful for Wikidata?

Yorwba commented 4 years ago

There's a usage example property where people might want to include sentences from Tatoeba. It's part of an integration between Wikidata and Wiktionary.

aucampia commented 4 years ago

Some of the structured data in WikiData include text labels and descriptions in various languages. There are also cases where useful structured data could be derived from a translation database such as possible translations or synonyms for words.