Unable to run using Amazon Sagemaker Studio

ashercn97 commented 1 year ago

I am trying to use axolotl in amazon sagemaker studio, but I cannot figure it out. I am using Python 3.9 and running the quickstart code in the README, but it doesnt work. I am super new to this so if someone could help me thatd be great! (Sorry if this is a stupid question)

ashercn97 commented 1 year ago

note I was doing the example/falcon/config-7b-lora.yml instead of the llama-3b one!

NanoCode012 commented 1 year ago

Please provide more details (errors etc). Try run the default open llama one.

ashercn97 commented 1 year ago

@NanoCode012 Okay. I will run the openllama one and then copy and paste the error. Will try right now.

ashercn97 commented 1 year ago

@NanoCode012 Can it be with a CPU runtime or should i only do GPU?

ashercn97 commented 1 year ago

@NanoCode012 i got this error using the Llama one (I was using a CPU runtime):

Traceback (most recent call last): File "/home/studio-lab-user/.conda/envs/python39/bin/accelerate", line 8, in sys.exit(main()) File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 979, in launch_command simple_launcher(args) File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/studio-lab-user/.conda/envs/python39/bin/python3.9', 'scripts/finetune.py', 'examples/openllama-3b/lora.yml']' died with <Signals.SIGKILL: 9>.

ashercn97 commented 1 year ago

@NanoCode012 Now i got it to work with the default but when I try to use my own config file that is the same as the llama except I change the dataset i get this error:

component = fn(*varargs, **kwargs) File "/home/studio-lab-user/axolotl/scripts/finetune.py", line 226, in train train_dataset, eval_dataset = load_prepare_datasets( File "/home/studio-lab-user/axolotl/src/axolotl/utils/data.py", line 393, in load_prepare_datasets dataset = load_tokenized_prepared_datasets( File "/home/studio-lab-user/axolotl/src/axolotl/utils/data.py", line 268, in load_tokenized_prepared_datasets samples = samples + list(d) File "/home/studio-lab-user/axolotl/src/axolotl/datasets.py", line 42, in iter yield self.prompt_tokenizer.tokenize_prompt(example) File "/home/studio-lab-user/axolotl/src/axolotl/prompt_tokenizers.py", line 116, in tokenize_prompt tokenized_res_prompt = self._tokenize( File "/home/studio-lab-user/axolotl/src/axolotl/prompt_tokenizers.py", line 64, in _tokenize result = self.tokenizer( File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2571, in call raise ValueError("You need to specify either text or text_target.") ValueError: You need to specify either text or text_target.

NanoCode012 commented 1 year ago

died with <Signals.SIGKILL: 9>.

with CPU runtime, should mean RAM OOM

ashercn97 commented 1 year ago

@NanoCode012 Thank you!

ashercn97 commented 1 year ago

Fiogured out how to work it :)

NanoCode012 commented 1 year ago

@ashercn97 , could you please detail it for any future individual?

ashercn97 commented 1 year ago

@NanoCode012 Yes ofc! I will write a step-by-step thing for how I did it.

I was using Amazon Sagemaker Studio Lab notebook things (the free one) in a **GPU** Instance. From my experience it ONLY works in a GPU instance thing.
I then cloned the github repo to the outermost layer of the folders. Not inside of the sagemaker studio notebook folder
Next what I did was make a conda environment and made it use python3.9 (ALSO A SUPER IMPORTANT STEP)
THEN I DID CD AXOLOTL and made sure I was in the one outside the folder
Now, in the terminal I did the pip3 install torch torchaudio torch thing (you can find it in the pytorch install locally thing), pip3 install -e . and pip3 install peft from the huggingface/github repo
Next, I had to make a good config file. Originallly, I used the openllama-3b lora file but then wanted to spice it up. So, i tried using falcon AND IT NEVER WORKED so I only, so far, have used openllama.
to make the file, i first needed a dataset. I tried both alpaca and alpaca_chat:load_open_orca thing but only Alpaca worked, so I have been using alpaca dataset format
I found a dataset that fit the thing and used it and formatted my own dataset into alpaca format (its a small and alpaca-tized version of the open-orca dataset on my huggingface: ashercn97)
I used the title and set it to alpaca, then left everything else the same except changed my output folder each time
Finally, i ran accelerate launch scripts/finetune.py and then did the path to my thing from the axolotl layer
To do inference, i used the provided command.

SOME TIPS: Use the alpaca dataset format it wont work if you have missing values, so get rid of those to stop it, just do control c in the terminal and it will save it and start back up USE GPU RUNTIME

Hope this helped!

axolotl-ai-cloud / axolotl

Unable to run using Amazon Sagemaker Studio #281