d0r1h / SAR

Hindi News Text Summarization
https://huggingface.co/spaces/d0r1h/Hindi_News_Summarizer
MIT License
0 stars 0 forks source link
abstractive-summarization hindi hindi-corpus huggingface news nlp python summarization transformer



Website tweet

State-of-the-art Summarization methods for Hindi in ЁЯдЧ

SAR (рд╕рд╛рд░) in Hindi means summary. This repository contains my work on Hindi Text Summarization on news article. ### Notebook: | Notebook | Colab | Kaggle | | ------ | ------ | ------ | | BaseLine | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/d0r1h/SAR/blob/main/notebooks/baseline.ipynb) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/code/undersc0re/hindi-text-summarization-baseline) | ### DataSet: * As of now I've released a sample Dataset of 2k pairs of text and summary which can be accessed at [Link](https://github.com/d0r1h/SAR/tree/main/dataset) ### Models: * Inference results are on 2k sample data. |Model | Checkpoint | Rouge-2[f_score] | Inference time | |--- | --- | --- | --- | |BART | [ai4bharat/IndicBART](https://huggingface.co/ai4bharat/IndicBART) | 21.48 | 20min 27s | |T5 | [csebuetnlp/mT5_multilingual_XLSum](https://huggingface.co/csebuetnlp/mT5_multilingual_XLSum) | 20.21 | 45min 54s| ### Project Pipeline

### API You can summarize any Hindi news article in just 5 lines of code ```python >>> import requests >>> api_endpoint = "https://hf.space/embed/d0r1h/Hindi_News_Summarizer/+/api/predict/" >>> news_url = "https://www.amarujala.com/uttar-pradesh/shamli/up-news-heroin-caught-in-shaheen-bagh-of-delhi-is-connection-to-kairana-and-muzaffarnagar?src=tlh\u0026position=3" >>> r = requests.post(url= api_endpoint, json = {"data": [ news_url, "BART"]}) >>> r.json()['data'][0] >>> рдпреВрдкреА рд╢рд╛рд╣реАрди рдмрд╛рдЧ рдореЗрдВ 100 рдХрд░реЛрдбрд╝ рд░реБрдкрдпреЗ рдХреАрдордд рдХреА рд╣реЗрд░реЛрдЗрди рдФрд░ рдЕрдиреНрдп рдорд╛рджрдХ рдкрджрд╛рд░реНрде рдХреА рдмрд░рд╛рдорджрдЧреА рд╡ рдЙрд╕реЗ рд▓рд╛рдиреЗ рдЕрдВрддрд░реНрд░рд╛рд╖реНрдЯреНрд░реАрдп рдбреНрд░рдЧреНрд╕ рддрд╕реНрдХрд░реЛрдВ рдХреЗ рдЧрд┐рд░реЛрд╣ рдХреЗ рддрд╛рд░ рд╢рд╛рдорд▓реА рдЬрд┐рд▓реЗ рдХреЗ рдХрд╕реНрдмрд╛ рдХреИрд░рд╛рдирд╛ рдФрд░ рдореБрдЬрдлреНрдлрд░рдирдЧрд░ рд╕реЗ рдЬреБрдбрд╝ рд░рд╣реЗ рд╣реИрдВред рдирд╛рд░рдХреЛрдЯрд┐рдХреНрд╕ рдХрдВрдЯреНрд░реЛрд▓ рдмреНрдпреВрд░реЛ рдПрдирд╕реАрдмреА рджрд┐рд▓реНрд▓реА рдХреА рдЯреАрдо рдиреЗ рдЧреБрд░реБрд╡рд╛рд░ рдХреЛ рдХреИрд▓рд╛рдирд╛ рд╕реЗ рджреЛ рд▓реЛрдЧреЛрдВ рдХреЛ рд╣рд┐рд░рд╛рд╕рдд рдореЗрдВ ``` ### Inference Demo: Application is hosted on ЁЯдЧ space and can be accessed at [SAR](https://huggingface.co/spaces/d0r1h/Hindi_News_Summarizer) ### Website Supported - [x] [Amarujala](https://www.amarujala.com) #### ToDO - [ ] Add support for following website - [ ] [aajtak](https://www.aajtak.in/) - [ ] [ndtv](https://ndtv.in/) - [ ] Foramtting Hindi text for wordcloud