SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
398 stars 30 forks source link

PreTrainedTokenizerBase.pad() got an unexpected keyword argument 'truncation' #36

Closed songxxzp closed 5 months ago

songxxzp commented 5 months ago

In angle.py:

        if end_with_eos:
            features = self.tokenizer.pad(
                {'input_ids': [feature['input_ids'] for feature in new_features]},
                padding=False,
                max_length=self.max_length - 1,
                return_tensors=return_tensors,
                truncation=True,
            )
            features['input_ids'] = [input_ids + [self.tokenizer.eos_token_id] for input_ids in features['input_ids']]
            features = self.tokenizer.pad(features, padding=self.padding, return_tensors=return_tensors)

TypeError: PreTrainedTokenizerBase.pad() got an unexpected keyword argument 'truncation'

I'm using AngIE 0.3.1, tokenizers 0.15.1

SeanLee97 commented 5 months ago

Hi @songxxzp , thanks for pointing out this issue.

I have fixed this issue in the branch: https://github.com/SeanLee97/AnglE/tree/feature/doc. But I didn't upgrade it. Maybe you can manually install AnglE from the branch as follows:

$ git clone -b feature/doc https://github.com/SeanLee97/AnglE.git
$ cd AnglE
$ pip install -e .