Open pipijiev12 opened 1 year ago
When executing the decoding stop condition setting of the beam_bearch search algorithm in outputs = self.generate(inputs, gen_kwargs), if max_len is used to limit the decoding output length, the problems described in the above issue will inevitably occur. Such problems, A stop token can be specified in the custom beam search algorithm. I see that the generate function in hf transformers has a parameter stopping_criteria (StoppingCriteriaList
, optional):
Custom stopping criteria that complement the default stopping criteria built from arguments and a generation config. If a stopping criteria is passed that is already created with the arguments or a generation config an error is thrown. This feature is intended for advanced users.
We can try to use this parameter to set more optimal stop decoding conditions. In addition, it is also worth considering to rebuild the training set of concise reply sentences to fine-tune the model. Just as alpaca's reply is accurate and concise, both vicuna and gpt3.5 seem a bit long-winded.
Currently I am importing the chatglm2 model through the transformers library, how should I go about modifying the beam_barch search algorithm you mentioned.
modeling_chatglm.py:
gen_kwargs = {"max_length": max_length, "num_beams": num_beams, "do_sample": do_sample, "top_p": top_p, "temperature": temperature, "logits_processor": logits_processor, kwargs} inputs = self.build_inputs(tokenizer, query, history=history) outputs = self.generate(inputs, **gen_kwargs)
you can also download transformers, but no need to modify the beam_barch search algorithm. I am also like this question, see how to add a suitable stop token id, It can also be solved by retraining the training data, but I don't have a good way now.
Is there an existing issue for this?
Current Behavior
Currently, I found a problem that after I gave the question to feed to the big language model, and let the big language model regenerate the answer by controlling the parameter max_token or max_length, it didn't get good results. The following situations exist: based on previously asked questions, it will exist in the history base answer truncated word count; based on previously unasked questions, it will generate answers, sometimes with too many words. How should this be solved?
Expected Behavior
I want to be able to control the length of the generated text through parameters, so that the generated answers are in line with human thinking and cannot be truncated.
Steps To Reproduce
Currently, I found a problem that after I gave the question to feed to the big language model, and let the big language model regenerate the answer by controlling the parameter max_token or max_length, it didn't get good results. The following situations exist: based on previously asked questions, it will exist in the history base answer truncated word count; based on previously unasked questions, it will generate answers, sometimes with too many words. How should this be solved?
Environment
Anything else?
No response