KhoomeiK / LlamaGym

Fine-tune LLM agents with online reinforcement learning
MIT License
974 stars 43 forks source link

Fix the unexpected action of llm #5

Open yuxiaooye opened 6 months ago

yuxiaooye commented 6 months ago

In agent.py, method Agent.llm(), I'm wondering we should add two statements before and after self.model.generate(), as:

context_len = inputs['attention_mask'].size(1)  # add
generate_ids = self.model.generate(...)
generate_ids = generate_ids[:, context_len:]  # add

outputs before add:

>>> outputs
['[INST] <<SYS>>\nYou are an expert blackjack player. Every turn, you\'ll see your current sum, the dealer\'s showing card value, and whether you have a usable ace. Win by exceeding the dealer\'s hand but not exceeding 21.\nDecide whether to stay with your current sum by writing "Action: 0" or accept another card by writing "Action: 1". Accept a card unless very close to 21.\n<</SYS>>\n\nYou: 15. Dealer: 5. You have no ace. [/INST]  Action: 0']

outputs after add:

>>> outputs
[' Action: 0']

we get the expected action by truncating the beginning part of the llm output, which is identical to the input of llm.

KhoomeiK commented 6 months ago

Thanks! This is definitely better and more standard than my .split("[/INST]") hack. Would appreciate if you could open a PR!