-
Dataloader name: `xl_sum/xl_sum.py`
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?xl_sum
| Dataset| xl_sum |
|-------------|---|
| Description | XL-Sum, a comprehensive an…
-
Used the following PDF: https://arxiv.org/pdf/1706.03762
The result looks ok, however the order or pages is incorrect.
Setup
```
git clone ...
pip install -e .
```
```python
import asyn…
-
**Hardware**:
CPU: Xeon® E5-2630 v2 but limited to 16GB as this is what the vast.ai instance has.
GPU: 4x A40 --> Total of 180GB
**OS**
Linux
**python**
3.10
**cuda**
12.2
**packa…
-
Dataloader name: `lr_sum/lr_sum.py`
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?lr_sum
| Dataset| lr_sum |
|-------------|---|
| Description | LR-Sum is a news abstracti…
-
[TeSum: Human-Generated Abstractive Summarization Corpus for Telugu](https://aclanthology.org/2022.lrec-1.614.pdf)
Link to dataset broken - https://github.com/manshri/TeSum/
-
we will need to find a couple of good models before we start development.
1) A word/document summarization model
2) word/document generation model
-
- XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages
- Contains many Indian languages
- [Paper](https://arxiv.org/abs/2106.13822)
- [Dataset](https://github.com/csebuetnlp/…
-
-
### System Info
```Shell
`Accelerate` version: 0.31.0
- Platform: Linux-5.4.0-171-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /workspace/Thesis/venv/bin/accelerate
- Python vers…
-
Reasonable size (7b) open source best LLM. This is useful if we want to apply fact-sheet type chat-bot extraction within a single doc.
- [openchat_3.5](https://huggingface.co/openchat/openchat_3.5)…