I've been working on a dense IR pipeline with BEIR including a custom dataloader, which works fine for dense IR runs but throws an exception whenever I add a cross encoder for reranking.
corpus = {}
for index, item in corpusdf.iteritems():
corpus.update({
"doc"+(str(index)): {
"title": "",
"text": item,
},
})
queries = {}
for index, row in queriesdf.iterrows():
queries.update({
"q"+str(index): {
"doc"+(str(index)): row[0],
},
})
qrels = {}
for i in range(len(df)):
qrels.update({
"q"+str(i): {
"doc"+(str(i)): 1,
},
})
Exception:
Traceback (most recent call last):
File "C:\Users\costco\venv\lib\site-packages\sentence_transformers\cross_encoder\CrossEncoder.py", line 273, in predict
for features in iterator:
File "C:\Users\costco\venv\lib\site-packages\tqdm\std.py", line 1180, in __iter__
for obj in iterable:
File "C:\Users\costco\venv\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
data = self._next_data()
File "C:\Users\costco\venv\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\costco\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in fetch
return self.collate_fn(data)
File "C:\Users\costco\venv\lib\site-packages\sentence_transformers\cross_encoder\CrossEncoder.py", line 93, in smart_batching_collate_text_only
texts[idx].append(text.strip())
AttributeError: 'dict' object has no attribute 'strip'
Seems like a simple fix but I am trying to avoid modifying BEIR sources, any ideas would be greatly appreciated!
Hi there,
I've been working on a dense IR pipeline with BEIR including a custom dataloader, which works fine for dense IR runs but throws an exception whenever I add a cross encoder for reranking.
Rerank:
Dataloader:
Exception:
Seems like a simple fix but I am trying to avoid modifying BEIR sources, any ideas would be greatly appreciated!