CLARIN-PL / embeddings

Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polish Language
https://clarin-pl.github.io/embeddings/
MIT License
36 stars 3 forks source link

Prepare converter from squad to beir #294

Open laugustyniak opened 1 year ago

laugustyniak commented 1 year ago

some similar ideas:

from beir.datasets.data_loader_hf import HFDataLoader
corpus, queries, qrels = HFDataLoader(hf_repo=f"clarin-knext/{dataset}", streaming=False, keep_in_memory=False).load(split=split)
# Conversion from DataSet
queries = {query['id']: {'text': query['text']} for query in queries}
corpus = {doc['id']: {'title': doc['title'] , 'text': doc['text']} for doc in corpus}
laugustyniak commented 1 year ago

@mkossakowski19 can you link the branch for it?