THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型
Apache License 2.0
6.08k stars 414 forks source link

generate multiple captions #65

Closed rahimentezari closed 11 months ago

rahimentezari commented 1 year ago

I am trying to generate multiple caption using chat function and using BeamSearchStrategy instead of BaseStrategy. Is this the right way?

strategy = BeamSearchStrategy(num_beams=5, length_penalty=1., 
            temperature=1., top_k=top_k, top_p=top_p,
            consider_end=True,
            end_tokens=[text_processor.tokenizer.eos_token_id], invalid_slices=invalid_slices, no_repeat_ngram_size=0, 
            min_tgt_length=0,
            repetition_penalty=repetition_penalty,
            prefer_min_length=5,
            prefer_max_length=100,
            stop_n_iter_unchanged=10)

Because I am getting this error

File "/miniconda3/envs/env_name/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/miniconda3/envs/env_name/lib/python3.10/site-packages/sat/model/position_embedding/triton_rotary_embeddings.py", line 25, in forward out = apply_rotary( File "/miniconda3/envs/env_name/lib/python3.10/site-packages/sat/model/position_embedding/triton_rotary.py", line 174, in apply_rotary assert batch == batch_ro AssertionError

mactavish91 commented 1 year ago

This issue is caused by an anomaly with the rope position encoding when the beam size is greater than 1. We are currently addressing this problem. In the meantime, you can temporarily set the beam size to 1, and adjust the topk and topp values to make the output more randomized.

1049451037 commented 11 months ago

This problem has been solved. You can update to latest SAT via:

git clone https://github.com/THUDM/SwissArmyTransformer
cd SwissArmyTransformer
pip install . --no-deps