State-of-the-art Summarization methods for Hindi in ЁЯдЧ
SAR (рд╕рд╛рд░) in Hindi means summary. This repository contains my work on Hindi Text Summarization on news article.
### Notebook:
| Notebook | Colab | Kaggle |
| ------ | ------ | ------ |
| BaseLine | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/d0r1h/SAR/blob/main/notebooks/baseline.ipynb) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/code/undersc0re/hindi-text-summarization-baseline) |
### DataSet:
* As of now I've released a sample Dataset of 2k pairs of text and summary which can be accessed at [Link](https://github.com/d0r1h/SAR/tree/main/dataset)
### Models:
* Inference results are on 2k sample data.
|Model | Checkpoint | Rouge-2[f_score] | Inference time |
|--- | --- | --- | --- |
|BART | [ai4bharat/IndicBART](https://huggingface.co/ai4bharat/IndicBART) | 21.48 | 20min 27s |
|T5 | [csebuetnlp/mT5_multilingual_XLSum](https://huggingface.co/csebuetnlp/mT5_multilingual_XLSum) | 20.21 | 45min 54s|
### Project Pipeline
### API
You can summarize any Hindi news article in just 5 lines of code
```python
>>> import requests
>>> api_endpoint = "https://hf.space/embed/d0r1h/Hindi_News_Summarizer/+/api/predict/"
>>> news_url = "https://www.amarujala.com/uttar-pradesh/shamli/up-news-heroin-caught-in-shaheen-bagh-of-delhi-is-connection-to-kairana-and-muzaffarnagar?src=tlh\u0026position=3"
>>> r = requests.post(url= api_endpoint,
json = {"data": [ news_url, "BART"]})
>>> r.json()['data'][0]
>>> рдпреВрдкреА рд╢рд╛рд╣реАрди рдмрд╛рдЧ рдореЗрдВ 100 рдХрд░реЛрдбрд╝ рд░реБрдкрдпреЗ рдХреАрдордд рдХреА рд╣реЗрд░реЛрдЗрди рдФрд░ рдЕрдиреНрдп рдорд╛рджрдХ рдкрджрд╛рд░реНрде рдХреА рдмрд░рд╛рдорджрдЧреА рд╡ рдЙрд╕реЗ рд▓рд╛рдиреЗ рдЕрдВрддрд░реНрд░рд╛рд╖реНрдЯреНрд░реАрдп рдбреНрд░рдЧреНрд╕ рддрд╕реНрдХрд░реЛрдВ рдХреЗ рдЧрд┐рд░реЛрд╣ рдХреЗ рддрд╛рд░ рд╢рд╛рдорд▓реА рдЬрд┐рд▓реЗ рдХреЗ рдХрд╕реНрдмрд╛ рдХреИрд░рд╛рдирд╛ рдФрд░ рдореБрдЬрдлреНрдлрд░рдирдЧрд░ рд╕реЗ рдЬреБрдбрд╝ рд░рд╣реЗ рд╣реИрдВред рдирд╛рд░рдХреЛрдЯрд┐рдХреНрд╕ рдХрдВрдЯреНрд░реЛрд▓ рдмреНрдпреВрд░реЛ рдПрдирд╕реАрдмреА рджрд┐рд▓реНрд▓реА рдХреА рдЯреАрдо рдиреЗ рдЧреБрд░реБрд╡рд╛рд░ рдХреЛ рдХреИрд▓рд╛рдирд╛ рд╕реЗ рджреЛ рд▓реЛрдЧреЛрдВ рдХреЛ рд╣рд┐рд░рд╛рд╕рдд рдореЗрдВ
```
### Inference Demo:
Application is hosted on ЁЯдЧ space and can be accessed at [SAR](https://huggingface.co/spaces/d0r1h/Hindi_News_Summarizer)
### Website Supported
- [x] [Amarujala](https://www.amarujala.com)
#### ToDO
- [ ] Add support for following website
- [ ] [aajtak](https://www.aajtak.in/)
- [ ] [ndtv](https://ndtv.in/)
- [ ] Foramtting Hindi text for wordcloud