agiresearch / OpenAGI

OpenAGI: When LLM Meets Domain Experts
MIT License
1.82k stars 151 forks source link

Error when I run "bash run_openagi.sh" in fine-tune mode #30

Closed bf-yang closed 6 months ago

bf-yang commented 10 months ago

Error when I run "bash run_openagi.sh" in fine-tune mode:

Traceback (most recent call last): File "/home/bufang/OpenAGI/openagi_benchmark.py", line 110, in main() File "/home/bufang/OpenAGI/openagi_benchmark.py", line 94, in main run_finetune_flan_t5(args) File "/home/bufang/OpenAGI/benchmark_tasks/finetune/finetune_schema_flan_t5.py", line 109, in run_finetune_flan_t5 generated_module_seq, log_prob = seqGen.generate_sequence([test_tasks[i]],\ File "/home/bufang/OpenAGI/benchmark_tasks/generate_model_seq.py", line 332, in generate_sequence output = self.model.generate_with_grad( TypeError: clone() got an unexpected keyword argument 'max_length'

I have set the TASK="finetune" and LLM_NAME="flan_t5" in run_openagi.sh.

TobyGE commented 10 months ago

Hello, I've executed the code on my end, and it seems the issue might be due to an incompatible torch version. Kindly ensure you have installed torch version "1.13.0+cu117".

Screenshot 2023-08-22 at 14 25 21
bf-yang commented 10 months ago

Hello, I've executed the code on my end, and it seems the issue might be due to an incompatible torch version. Kindly ensure you have installed torch version "1.13.0+cu117".

Screenshot 2023-08-22 at 14 25 21

It works for me. Thanks a lot. By the way, I have another two questions:

  1. When I run "python benchmark_tasks/finetune/flan_t5_finetune.py", it always reports "accuracy is 0.0" during fine-tuning process, like this: image Is this result reasonable? I use the train_model_sequence.txt and train_task_description.txt files during fine-tuning.

  2. When I evaluate RLTF-based Flan-T5-Large, it reports out of memory. My machine is 3090, 24GB RAM. Are there any solutions that I can run the code? image

TobyGE commented 10 months ago

Q1: That's understandable since the model needs to produce the exact same model sequence. It might be worth observing if the accuracy may increase around larger epochs like 180 or 190. Q2: If you're working with just one 3090, consider using Lora to optimize GPU memory usage. We're also gearing up to launch a Lora version of Flan-T5. Stay tuned to our repository for the latest updates.

bf-yang commented 10 months ago

Q1: That's understandable since the model needs to produce the exact same model sequence. It might be worth observing if the accuracy may increase around larger epochs like 180 or 190. Q2: If you're working with just one 3090, consider using Lora to optimize GPU memory usage. We're also gearing up to launch a Lora version of Flan-T5. Stay tuned to our repository for the latest updates.

I see. Really thanks for your reply.

TobyGE commented 10 months ago

You're welcome.