bigscience-workshop / data_tooling

Tools for managing datasets for governance and training.
Apache License 2.0
79 stars 48 forks source link

Create dataset ahotsak #112

Open albertvillanova opened 3 years ago

albertvillanova commented 3 years ago
cakiki commented 3 years ago

self-assign

cakiki commented 3 years ago

I have reached out; will report back once I hear back from the ahostak project.

There seems to be an API but the request key link is broken. (I've asked about this as well)

https://ahotsak.eus/api/v2/ https://ahotsak.eus/api/v2/dokumentazioa/

albertvillanova commented 2 years ago

Thanks, @cakiki.

Any feedbackfrom the custodians?

cakiki commented 2 years ago

@albertvillanova None. Should we try to find a Basque speaker within Big Science? I could also try reaching them again.

I also found this dataset of transcribed Basque audio: https://research.google/tools/datasets/basque-tts/ (~7100 sentences)

Part of a bigger resource available here: https://github.com/google/language-resources and here: http://openslr.org/resources.php both of which might be interesting to look into. cc @yjernite

albertvillanova commented 2 years ago

I think the basque-tts (according to the info in their site) is part of SLR: SLR76

Should we add this? CC: @yjernite