[REQ] intermediate results

0xlws commented 1 year ago

@JadeCopet @adefossez a while ago i played around with trying to get intermediate results using yield from inside the lm.generate() and then yield from in the above functions (works).

it resulted in CUDA errors, i think because it generating the tokens and me also trying to make it turn tokens into audio tokens for preview. is there a way to prevent these errors or point me to the right direction for this, its possible to return the intermediate tokens as they are generated, just not process them while the loop is still going i guess.

in short i cant process the tokens for preview (compression model.generate_audio) while its also in the while loop getting the next token, looking for a workaround/solution. any advice?

adefossez commented 12 months ago

Strange, I don't see why this would result in a CUDA error, can you provide more information on exactly what is going on and what error you are getting ?

0xlws commented 11 months ago

thanks for your support! i tried again yesterday and was able to solve the issue, i did not encounter errors such as 'device side assertion (cuda)' anymore.

im passing a parameter frames_interval to the generate function, more info:

musicgen.py

def generate(self, descriptions: tp.List[str], progress: bool = False, return_tokens: bool = False, frames_interval: int = 0) \
-> tp.Union[torch.Tensor, tp.Tuple[torch.Tensor, torch.Tensor], tuple]:
attributes, prompt_tokens = self._prepare_tokens_and_attributes(descriptions, None)
assert prompt_tokens is None
if frames_interval > 0:
for intermediate_tokens in self._generate_tokens(attributes, prompt_tokens, progress, frames_interval=frames_interval):
tokens = self.generate_audio(intermediate_tokens)
decoded_tokens = tokens
yield tokens
if return_tokens:
yield (decoded_tokens, tokens)  # yielding final return values
return  
else:
tokens = next(self._generate_tokens(attributes, prompt_tokens, progress), None)
if return_tokens:
return self.generate_audio(tokens), tokens
return self.generate_audio(tokens)

lm.py

if frames_interval > 0:
if (offset) % frames_interval == 0:
_tmp = gen_sequence.clone() 
yield self.process_gen_sequence(_tmp, mask, pattern, unknown_token, max_gen_len, B=B)
del _tmp

made the sanity-checks/post-processing reusable in the same class:

@staticmethod
def process_gen_sequence(
    gen_sequence,
    mask,
    pattern,
    unknown_token,
    max_gen_len,
    card=2048,
    start_offset=0,
    remove_prompts=False,
    B=1,
):
    """Process generated sequence tensor by performing various operations and checks.

    Args:
        gen_sequence (torch.Tensor): the input generated sequence tensor.
        mask (torch.Tensor): Mask for the sequence tensor.
        pattern: The pattern used for the sequence generation.
        unknown_token: Token representing unknown element in sequence.
        max_gen_len (int): Maximum generation length.
        start_offset (int, optional): The start offset for the sequence. Default=0.
        remove_prompts (bool, optional): To consider the prompt elements or not. Default=False.
        B (int, optional): Batch size for expanding the mask. Default=1.

    Returns:
        out_codes (torch.Tensor): Transformed sequence tensor.
    """

    replace_token = 0
        # find the index where unknown_token appears
    unknown_indices = (gen_sequence == unknown_token).nonzero()
    if unknown_indices.nelement() > 0:
        first_unknown_index = unknown_indices[0][2]

    gen_sequence = torch.where(gen_sequence == unknown_token, replace_token, gen_sequence)
    # ensure sequence has been entirely filled
    assert not (gen_sequence == unknown_token).any()

    # ensure gen_sequence pattern and mask are matching
    # which means the gen_sequence is valid according to the pattern
    assert (
        gen_sequence
        == torch.where(mask[None, ...].expand(B, -1, -1), gen_sequence, card)
    ).all()

    # get back the codes, trimming the prompt if needed and cutting potentially incomplete timesteps
    out_codes, out_indexes, out_mask = pattern.revert_pattern_sequence(
        gen_sequence, special_token=unknown_token
    )

    # sanity checks over the returned codes and corresponding masks
    assert (out_codes[..., :max_gen_len] != unknown_token).all()
    assert (out_mask[..., :max_gen_len] == 1).all()

    out_start_offset = start_offset if remove_prompts else 0
    out_codes = out_codes[..., out_start_offset:max_gen_len]

    try:
        out_codes = out_codes[..., out_start_offset:first_unknown_index - 4] 
    except:
        pass

    # ensure the returned codes are all valid
    assert (out_codes >= 0).all() and (out_codes <= card).all()

    return out_codes

demo.ipynb

with intermediate results:


from audiocraft.utils.notebook import display_audio

model.set_generation_params( use_sampling=True, top_k=250, duration=3 )

l=[] final_values = None for result in model.generate( descriptions=[ 'drum and bass beat with intense percussions' ], progress=True, return_tokens=True, frames_interval=50 ): if type(result) == tuple: final_values = result else: display_audio(result, 32000)

<img width="309" alt="Screenshot 2023-09-01 at 07 41 39" src="https://github.com/facebookresearch/audiocraft/assets/87901794/d7e354a6-0d09-4ab0-871f-8de52edf28e0">

- return values as usual (no intermediate results):
```python
from audiocraft.utils.notebook import display_audio

model.set_generation_params(
    use_sampling=True,
    top_k=250,
    duration=2
)

output = model.generate(
    descriptions=[
        'drum and bass beat with intense percussions'
    ],
    progress=True, return_tokens=True
)

try:
    output = next(output)
except StopIteration as e:
    output = e.value

display_audio(output[0], sample_rate=32000)
if USE_DIFFUSION_DECODER:
    out_diffusion = mbd.tokens_to_wav(output[1])
    display_audio(out_diffusion, sample_rate=32000)

i was going to try and wrap it so it would behave as usual, abstracting away the generator logic

edit: added musicgen.py code snippet edit2: noticed i need to correct musicgen.py return values under yield condition [decoded_tokens, tokens]

facebookresearch / audiocraft

[REQ] intermediate results #252