-
Issue is to track evaluation of RAG implementations.
Frameworks:
- RAGEval
- https://github.com/OpenBMB/RAGEval
- https://arxiv.org/pdf/2408.01262
- AutoRAG
- https://github.com/Marker-Inc-K…
-
I am experiencing difficulties reproducing the results on the Arxiv dataset using the GNNSAFE setting. While I have successfully reproduced results on other datasets, the Arxiv dataset results I obtai…
-
Hi I tried to run other dataset such as pubmed & ogbn-product dataset .
I successfully ran the prepare data script and got
orkut, ogbn-products, ogbn-100m and pubmed preprocessed dataset
ogbn…
-
Hi,
I really like your project as it provides an easy-to-use approach. I have been thinking that since the new Llama 3.1 is multilingual, could this approach also be used in that way? As we are on…
-
This is just a collection of scripts / ideas that I think they might be useful. I've been writing some scripts already while preparing the talk for DIS so might as well make it into a cli for the data…
-
- Here's the summary of consulting a LLM specialist:
---
- We have an initial thought in #74 as follows:
![image](https://github.com/user-attachments/assets/265a3d7d-0454-4e7b-9c99-a0dd9f9ecf7c…
-
Based on what we discussed on the discord call, I will be looking more into dataset improvements. The scraper already works well, as mentioned from TODOs just needs to be expanded. Also some related t…
-
I truly admire the outstanding work you've done. While comparing models, I noticed that both you and MDRN use the DIV2K dataset along with the first 10K images from LSDIR for training. Most lightweigh…
-
In terms of recreating the dataset i believe it's actually best if @wq2012 recreates the dataset with daan and pet of google. And @afk0901 finish our writeup of this dataset creation. When we are both…
-
Title.
Benchmarks:
Summarization
- [x] G-Eval
- [ ] SummHay - https://arxiv.org/abs/2407.01370v1 & https://github.com/salesforce/summary-of-a-haystack
- https://arxiv.org/html/2403.19889v1
R…