dataculturegroup / DataBasic

A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups
http://www.databasic.io/
MIT License
62 stars 16 forks source link

Spanish language sample data #170

Closed kanarinka closed 8 years ago

kanarinka commented 8 years ago

To be really internationalized we should have spanish language sample data for the tools. @rahulbot - do you think we should work towards this? We'd need to get lyrics for spanish-speaking musicians and make sure the spanish activity guide references those. As well as 1-2 spanish language data sets for WTFcsv.

rahulbot commented 8 years ago

Agreed. I tried to get lyrics from the same source for Spanish artists, but their data isn't encoded well (lots garbage chars). Maybe we can scrape lyrics from genius or something? On Thu, Jan 7, 2016 at 7:28 AM kanarinka notifications@github.com wrote:

To be really internationalized we should have spanish language sample data for the tools. Rahul - do you think we should work towards this? We'd need to get lyrics for spanish-speaking musicians and make sure the spanish activity guide references those. As well as 1-2 spanish language data sets for WTFcsv.

— Reply to this email directly or view it on GitHub https://github.com/c4fcm/DataBasic/issues/170.

kanarinka commented 8 years ago

ok I'll drop Pablo a line - he might have some good ideas for data sources

On Thu, Jan 7, 2016 at 9:00 AM rahulbot notifications@github.com wrote:

Agreed. I tried to get lyrics from the same source for Spanish artists, but their data isn't encoded well (lots garbage chars). Maybe we can scrape lyrics from genius or something? On Thu, Jan 7, 2016 at 7:28 AM kanarinka notifications@github.com wrote:

To be really internationalized we should have spanish language sample data for the tools. Rahul - do you think we should work towards this? We'd need to get lyrics for spanish-speaking musicians and make sure the spanish activity guide references those. As well as 1-2 spanish language data sets for WTFcsv.

— Reply to this email directly or view it on GitHub https://github.com/c4fcm/DataBasic/issues/170.

— Reply to this email directly or view it on GitHub https://github.com/c4fcm/DataBasic/issues/170#issuecomment-169672016.

kanarinka commented 8 years ago

En cuanto a bases de datos curiosas, solo se me ocurre la que elabora el INE a partir de los datos del padrón con los nombres más comunes en España. Se puede consultar aquí:

http://www.ine.es/daco/daco42/nombyapel/nombyapel.htm

kanarinka commented 8 years ago

Note to self: Check out Spanish language datasets published by TuvaLabs