AI4Bharat / indicnlp_catalog

A collaborative catalog of NLP resources for Indic languages
https://ai4bharat.github.io/indicnlp_catalog
531 stars 77 forks source link

Mukhyansh: A Headline Generation Dataset for Indic Languages #240

Open anoopkunchukuttan opened 6 months ago

anoopkunchukuttan commented 6 months ago

Large scale headline generation dataset for 8 Indian languages created through custom crawls of websites and filtering to ensure high quality: 3.39 million articles

Paper: https://arxiv.org/abs/2311.17743 Dataset and models: https://github.com/ltrc/Mukhyansh

Github page is empty