Open saadnaeem-dev opened 11 months ago
same here
same here
same here
same here, anyone find a workaround yet?
I caught a minor work around but dont know the exact details of why it works. It can't find the 50334 token but i dont see where its being implement. If anyone could let me know.
Main.py file
And in the decoding .py
seems to work with batches of 3 per gpu. Not more though. The original below gives the error in the OP
for
line 487 decoder.py
It saves the [50258] token in the get_tokenizer sot_sequence. But then this [50334] comes out of nowhere which cant be found in the original list
any fix for this?
Error in decoding.py
this line is causing an issue in decoder.py:
self.sot_index: int = self.initial_tokens.index(tokenizer.sot)
where
ValueError: tuple.index(x): x not in tuple
self.initial_tokens
Out[10]: (50257,)
type(self.initial_tokens)
Out[11]: tuple
type(tokenizer.sot)
Out[12]: int
tokenizer.sot
Out[13]: 50333
since
50257
is not in (50333
) we get ValueError: tuple.index(x): x not in tuplefor the correct cases these are the values (when using latest openai-whisper) we get
self.initial_tokens
Out[5]: (50257,)
type(self.initial_tokens)
Out[6]: tuple
tokenizer.sot
Out[7]: 50257
type(tokenizer.sot)
Out[8]: int
which is correct as
50257
exists in tuple50257
and we are able to get its index