AI4Bharat / NPTEL2020-Indian-English-Speech-Dataset

NPTEL2020: Speech2Text dataset for Indian-English Accent
72 stars 20 forks source link

Need clarification on licensing. #14

Open jeb-orcl opened 2 years ago

jeb-orcl commented 2 years ago

It appears that this corpus is compiled from the "nptelhrd" playlist at https://www.youtube.com/playlist?list=UU640y4UvDAlya_WOj5U4pfA.

When you downloaded the videos, did you download all videos on the playlist or only the ones whose video description included the "Creative Commons Attribution license (reuse allowed)" link to the YouTube Creative Commons license page?

I am interested in this corpus, but the organization I work for will require that everything in the corpus allows reuse.

GokulNC commented 2 years ago

Yes, all the videos are under CC license, which allow reuse (with attribution). We had sampled a good number of lecture videos (with subtitles) from the playlist to verify if all of them are CC.

If you want to be double safe, we have also released metadata for all audio, which would also include a field called "license", which you can use to be sure you're always using CC videos.