Open RyckRichards opened 5 years ago
I'd say this is a low priority request, since at this time there aren't enough CC0 licensed sentences. Anyone using the data would pretty much have to grab sentences from the whole corpus to make it worthwhile.
As replied on the Wall:
The sentences_CC0.csv file has the text of the sentences already. There is no need to do further processing, you can just download the file.
@RyckRichards Do you need anything more than what's in the file?
I'm not that good on working with .csv files. It'd be good (if it doesn't take too much effort, of course) if there is a way to select which sentences we want to download.
What criteria exactly would you use to select the sentences?
I suppose you would want all CC0 sentences in a specific language (not in all language). Is there anything else?
What criteria exactly would you use to select the sentences?
Most translated, audio attached, new sentences, old sentences
At this time there are so few sentences in the file that you can easily open the file in a word processor, or in Excel or Google Sheets. You can use OpenOffice if you don't own Excel.
Agreed but I believe it will change soon.
@RyckRichards When you say "possibility to filter", what do you expect actually? Do you need a file containing all CC0 sentences with specified criteria, or do you need it as search results so you can pick sentences to make lists?
Yes, that's right.
--
Ricardo Vernaut Junior
That's not answering my question... Does that mean you need both?
Oh sorry. Yes, both of them.
--
Ricardo Vernaut Junior
Now that we handle sentences under CC 2.0 and CC0, it'd be good for developers/course creators to filter sentences according their license. As mentioned by CK in https://tatoeba.org/eng/wall/show_message/31351#message_31351, we'd have to "climb a mountain" do have such thing (at least for me that's not experienced in programming):
As we have filters for sentences written by native speakers, audio attached.