Blair-Johnson / batch-whisper

Batch Support for OpenAI Whisper
MIT License
85 stars 22 forks source link

ValueError: tuple.index(x): x not in tuple #14

Open saadnaeem-dev opened 11 months ago

saadnaeem-dev commented 11 months ago

Error in decoding.py

this line is causing an issue in decoder.py:

self.sot_index: int = self.initial_tokens.index(tokenizer.sot)

where

ValueError: tuple.index(x): x not in tuple

self.initial_tokens Out[10]: (50257,) type(self.initial_tokens) Out[11]: tuple type(tokenizer.sot) Out[12]: int tokenizer.sot Out[13]: 50333

since 50257 is not in (50333) we get ValueError: tuple.index(x): x not in tuple

for the correct cases these are the values (when using latest openai-whisper) we get

self.initial_tokens Out[5]: (50257,) type(self.initial_tokens) Out[6]: tuple tokenizer.sot Out[7]: 50257 type(tokenizer.sot) Out[8]: int

which is correct as 50257 exists in tuple 50257 and we are able to get its index

vidalfer commented 11 months ago

same here

marcoyang1998 commented 11 months ago

same here

XuJingye2022 commented 11 months ago

same here

cglackin commented 10 months ago

same here, anyone find a workaround yet?

constan1 commented 9 months ago

I caught a minor work around but dont know the exact details of why it works. It can't find the 50334 token but i dont see where its being implement. If anyone could let me know.

Main.py file image

And in the decoding .py image

seems to work with batches of 3 per gpu. Not more though. The original below gives the error in the OP

image

image for

image

line 487 decoder.py

It saves the [50258] token in the get_tokenizer sot_sequence. But then this [50334] comes out of nowhere which cant be found in the original list

image

mohith7548 commented 8 months ago

any fix for this?