rtadewald commented 6 months ago

Hello. First of all, thank you so much for sharing this amazing library with all of us.

I've tried to run the Coqui Engine in my Macbook Pro M1 Max Chip, using the coqui_test.py file in the test directory, but received the following error: Do you know what could it be?

Thank you in advance.

/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
 > Using model: xtts
Starting to play stream
Opening stream
XTTS Synthesizing: Hey guys! These here are realtime spoken sentences based on local text synthesis.
/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
CoquiEngine: General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
 * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)
 occured in synthesize worker thread of coqui engine.
ERROR:root:Error synthesizing text: Hey guys! These here are realtime spoken sentences based on local text synthesis.
Traceback: Traceback (most recent call last):
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/engines/coqui_engine.py", line 591, in _synthesize_worker
    for i, chunk in enumerate(chunks):
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/TTS/tts/models/xtts.py", line 652, in inference_stream
    gpt_generator = self.gpt.get_generator(
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/TTS/tts/layers/xtts/gpt.py", line 603, in get_generator
    return self.gpt_inference.generate_stream(
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/TTS/tts/layers/xtts/stream_generator.py", line 186, in generate
    model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/transformers/generation/utils.py", line 473, in _prepare_attention_mask_for_generation
    torch.isin(elements=inputs, test_elements=pad_token_id).any()
TypeError: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
 * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)

ERROR:root:Error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
 * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)

WARNING:root:engine coqui failed to synthesize sentence "Hey guys! These here are realtime spoken sentences based on local text synthesis.", unknown error
Error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
 * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)

WARNING:root:engine coqui is the only engine available, can't switch to another engine
WARNING:root:engine coqui failed to synthesize sentence "With a local, neuronal, cloned voice." with error: 
Traceback: Traceback (most recent call last):
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/text_to_stream.py", line 343, in synthesize_worker
    success = self.engine.synthesize(sentence)
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/engines/coqui_engine.py", line 793, in synthesize
    status, result = self.parent_synthesize_pipe.recv()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

Error: 
WARNING:root:engine coqui is the only engine available, can't switch to another engine
WARNING:root:engine coqui failed to synthesize sentence "So every spoken sentence sounds unique." with error: [Errno 32] Broken pipe
Traceback: Traceback (most recent call last):
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/text_to_stream.py", line 343, in synthesize_worker
    success = self.engine.synthesize(sentence)
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/engines/coqui_engine.py", line 791, in synthesize
    self.send_command('synthesize', data)
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/engines/coqui_engine.py", line 662, in send_command
    self.parent_synthesize_pipe.send(message)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

Error: [Errno 32] Broken pipe
WARNING:root:engine coqui is the only engine available, can't switch to another engine
Traceback (most recent call last):
  File "/Users/rtadewald/Library/Mobile Documents/com~apple~CloudDocs/Projetos/LangChain/Isaac Obsidian/fast_tts.py", line 45, in <module>
    engine.shutdown()
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/engines/coqui_engine.py", line 912, in shutdown
    self.send_command('shutdown', {})
  File "/Users/rtadewald/Library/Python/3.9/lib/python/site-packages/RealtimeTTS/engines/coqui_engine.py", line 662, in send_command
    self.parent_synthesize_pipe.send(message)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

KoljaB commented 6 months ago

Also ran into this today. It goes back to a change in transformers library.

The quickest fix is to go back to an older transformers version: pip install transformers==4.38.2

This is no good solution for the future of course. The problem is that the error occurs between Coqui TTS, torch and transformers library. I guess either torch or Coqui TTS would need to adjust for the change. If it's torch they for sure will adress this. If it's Coqui TTS we have a problem, since it isn't maintained anymore and they have transformers>=4.33.0 in their requirements.

If you'd want to work with latest version of transformers you need to change the source code. You can open transformers/generation/utils.py, go to _prepare_attention_mask_for_generation method, replace this code:

        is_pad_token_in_inputs = (pad_token_id is not None) and (
            torch.isin(elements=inputs, test_elements=pad_token_id).any()
        )
        is_pad_token_not_equal_to_eos_token_id = (eos_token_id is None) or ~(
            torch.isin(elements=eos_token_id, test_elements=pad_token_id).any()
        )

with this code

        pad_token_id_tensor = torch.tensor(pad_token_id, device=inputs.device)
        if eos_token_id is not None:
            eos_token_id_tensor = torch.tensor(eos_token_id, device=inputs.device)
        else:
            eos_token_id_tensor = None

        is_pad_token_in_inputs = (pad_token_id is not None) and (
            torch.isin(elements=inputs, test_elements=pad_token_id_tensor).any()
        )
        is_pad_token_not_equal_to_eos_token_id = (eos_token_id is None) or ~(
            torch.isin(elements=eos_token_id_tensor, test_elements=pad_token_id_tensor).any()
        )

in front of the line with is_pad_token_in_inputs =. This will also fix the error and allow you to use the latest transformers version. This also is not a very satisfying solution because noone wants to change transformers source code for every new version released. I'm also not sure if this fix has any side effects.

arjundussa65 commented 6 months ago

Thank you for this work. Can you provide the modified function with the update for temp fix to use mps?

def _prepare_attention_mask_for_generation( self, inputs: torch.Tensor, pad_token_id: Optional[torch.Tensor], eos_token_id: Optional[torch.Tensor], ) -> torch.LongTensor:

No information for attention mask inference -> return default attention mask

    default_attention_mask = torch.ones(inputs.shape[:2], dtype=torch.long, device=inputs.device)
    if pad_token_id is None:
        return default_attention_mask

    is_input_ids = len(inputs.shape) == 2 and inputs.dtype in [torch.int, torch.long]
    if not is_input_ids:
        return default_attention_mask

    # Otherwise we have may have information -> try to infer the attention mask
    """if inputs.device.type == "mps":
        # mps does not support torch.isin (https://github.com/pytorch/pytorch/issues/77764)
        raise ValueError(
            "Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device."
        )"""

    is_pad_token_in_inputs = (pad_token_id is not None) and (
        torch.isin(elements=inputs, test_elements=pad_token_id).any()
    )
    is_pad_token_not_equal_to_eos_token_id = (eos_token_id is None) or ~(
        torch.isin(elements=eos_token_id, test_elements=pad_token_id).any()
    )
    can_infer_attention_mask = is_pad_token_in_inputs * is_pad_token_not_equal_to_eos_token_id
    attention_mask_from_padding = inputs.ne(pad_token_id).long()

    attention_mask = (
        attention_mask_from_padding * can_infer_attention_mask + default_attention_mask * ~can_infer_attention_mask
    )
    return attention_mask

KoljaB commented 6 months ago

This is the full fixed method:

    def _prepare_attention_mask_for_generation(
        self,
        inputs: torch.Tensor,
        pad_token_id: Optional[torch.Tensor],
        eos_token_id: Optional[torch.Tensor],
    ) -> torch.LongTensor:
        # No information for attention mask inference -> return default attention mask
        default_attention_mask = torch.ones(inputs.shape[:2], dtype=torch.long, device=inputs.device)
        if pad_token_id is None:
            return default_attention_mask

        is_input_ids = len(inputs.shape) == 2 and inputs.dtype in [torch.int, torch.long]
        if not is_input_ids:
            return default_attention_mask

        # Otherwise we have may have information -> try to infer the attention mask
        if inputs.device.type == "mps":
            # mps does not support torch.isin (https://github.com/pytorch/pytorch/issues/77764)
            raise ValueError(
                "Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device."
            )

        is_pad_token_in_inputs = (pad_token_id is not None) and (
            torch.isin(elements=inputs, test_elements=pad_token_id).any()
        )
        is_pad_token_not_equal_to_eos_token_id = (eos_token_id is None) or ~(
            torch.isin(elements=eos_token_id, test_elements=pad_token_id).any()
        )
        can_infer_attention_mask = is_pad_token_in_inputs * is_pad_token_not_equal_to_eos_token_id
        attention_mask_from_padding = inputs.ne(pad_token_id).long()

        attention_mask = (
            attention_mask_from_padding * can_infer_attention_mask + default_attention_mask * ~can_infer_attention_mask
        )
        return attention_mask

I guess this will still not work with mps looking at the code that says: "mps does not support torch.isin". This is something out of my scope and needs to be done by pytorch team.

KoljaB / RealtimeTTS

"Broken Pipe" eerror running the coqui_test.py #85

No information for attention mask inference -> return default attention mask