Open weidotwisc opened 7 months ago
train.json The attached is the training data.
Hi, I think I found the problem:
I used the flash attention flag during training, so in the inference code, we would need the extra flag
_attn_implementation="flash_attention_2"
when loading model.
Thanks!
I feel it probably worths noting this down in the inference document so that the consumer of this model needs to be aware of this particular flash attention flag as it requires changes to downstream inference pipeline code.
@weidotwisc on second thoughts im a little puzzled. FA2 is a drop-in replacement for the attention layer, which only speeds up the compute, but should not affect the result. As such, regardless if you had trained in FA2, during inference it should not matter if you had set FA2 or not.
@fabianlim
From what I have observed, if we use SFT trainer code and specify
--use_flash_attn False
, then the inference model loading code should be:
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, torch_dtype=torch.bfloat16)
If we use SFT trainer code and specify --use_flash_attn True
, then the inference model loading code should be:
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, torch_dtype=torch.bfloat16, _attn_implementation="flash_attention_2")
Otherwise the model will predict mostly garbage tokens as I have documented above.
@weidotwisc could you also share you test set and evaluation script if you have it?
EDIT: I now see the train.json in the above comment. I missed it the first time around.
Trying to reproduce this issue using the platform testing environment DevStage
.
This process is being documented here.
@weidotwisc I don't have access to the data set you used to tune, export DATA_PATH=/dccstor/weiz/peft/train.json
. How can I get it? Your directory on CCC is not accessible with my CCC credentials. Can you temporarily copy it to one of the public /dccstor/ai4code-ansible
directories? I'll delete it as soon as I get it.
Thanks @olson-ibm ! @Ssukriti -- The evaluation pipeline is a little tough to set up -- it requires some artifacts credentials to install some specific ansible related libraries and then some set up for post processing before calling into a pytest-based environment. However, to repeat this flash attention related issue, you can pick a sample from the input dataset and let pt-tuned model to generate output, the results will be looking gibberish like
::::::......:..............,.............,..............,.,..:.......:.........2.2......................
instead of any reasonable yaml-like output such as
community.docker.docker_login:
registry: "{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}"
username: "{{ registry_username }}"
password: "{{ registry_password }}"
@olson-ibm thank you! Some of the things we need to try :
1e-5
and 0.05
b. if it results in OOM, then reduce max_seq_length to 1024@Ssukriti could we please assign this to someone if it is being worked on by the platform team? Or should we tracking this elsewhere?
@weidotwisc Neither I nor @olson-ibm were able to reproduce your low quality inference results although we are running inference different than you. You can see Joe's configuration and inference run in the issue.
Here are the configurations I used:
{
"model_name_or_path": "granite-20b-code-all-yaml-2k-v1.1",
"tokenizer_name_or_path": "granite-20b-code-all-yaml-2k-v1.1",
"training_data_path": "train.json",
"num_train_epochs": 5.0,
"per_device_train_batch_size": 2,
"gradient_accumulation_steps": 1,
"learning_rate": 0.03,
"response_template": "\n### Response:",
"dataset_text_field": "output",
"num_virtual_tokens": 100,
"prompt_tuning_init": "RANDOM",
"peft_method": "pt",
"logging_steps": 1,
"include_tokens_per_second": true,
"max_seq_length": 512
}
This matches the same configurations you used above. I prompt tuned on 1 GPU with flash-attention with the accelerate_launch.py script which calls the sft_trainer.py
script with the above configurations. This uses the same training data you linked to above.
Here is the loss for the tuning:
{'loss': 0.9866, 'grad_norm': 0.013051297515630722, 'learning_rate': 0.024, 'epoch': 1.0}
{'loss': 0.5044, 'grad_norm': 0.02981734648346901, 'learning_rate': 0.018, 'epoch': 2.0}
{'loss': 0.4916, 'grad_norm': 0.011600751429796219, 'learning_rate': 0.012, 'epoch': 3.0}
{'loss': 0.4751, 'grad_norm': 0.0, 'learning_rate': 0.006, 'epoch': 4.0}
{'loss': 0.4658, 'grad_norm': 0.010685372166335583, 'learning_rate': 0.0, 'epoch': 5.0}
{'train_runtime': 2525.3289, 'train_samples_per_second': 4.538, 'train_steps_per_second': 2.269, 'train_tokens_per_second': 2025.542, 'train_loss': 0.5846864923340696, 'epoch': 5.0}
I then ran inference using the run_inference.py script with changes I have in a PR to run with or without flash-attention.
Because the sequence length is so low, I got 1368 warning messages of This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the max_seq_length.
I tried to increase the max_seq_length to 1024 and 2048 but ran into OOM failures.
An inference I ran with flash-attention:
{
"formatted input": "### Input: \n- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry\n\n### Response:",
"predicted target": [
"community.docker.docker_login:\n registry: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n username: \"{{ registry_username }}\"\n password: \"{{ registry_password }}\"\n email: \"{{ registry_email }}\"\n reauthorize: true\n\n- name: Pull the image\n community.docker.docker_image:\n name: \"{{ image_name }}\"\n tag: \"{{ image_tag }}\"\n source: pull"
],
"ref target": [
"community.docker.docker_login:\n password: \"{{ registry_pass }}\"\n username: \"{{ registry_user }}\"\n registry_url: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n reauthorize: true"
]
},
Same without flash-attention enabled at inference time:
{
"formatted input": "### Input: \n- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry\n\n### Response:",
"predicted target": [
"community.docker.docker_login:\n registry: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n username: \"{{ registry_username }}\"\n password: \"{{ registry_password }}\"\n email: \"{{ registry_email }}\"\n reauthorize: true\n\n- name: Pull the image\n community.docker.docker_image:\n name: \"{{ image_name }}\"\n tag: \"{{ image_tag }}\"\n source: pull"
],
"ref target": [
"community.docker.docker_login:\n password: \"{{ registry_pass }}\"\n username: \"{{ registry_user }}\"\n registry_url: \"{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\"\n reauthorize: true"
]
}
As you can see the outputs are the same and both are formatting in correct ansible.
Here is another one where the results are slightly different but both still look good: With flash-attn:
{
"formatted input": "### Input: \n- name: Create {{ docker_compose_dir }} directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}\"\n state: directory\n\n- name: Create Redis socket directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}/redis_socket\"\n state: directory\n mode: 511\n\n- name: Create Docker Compose Configuration\n ansible.builtin.template:\n src: \"{{ item.file }}.j2\"\n dest: \"{{ docker_compose_dir }}/{{ item.file }}\"\n mode: \"{{ item.mode }}\"\n loop:\n - file: environment.sh\n mode: \"0600\"\n - file: credentials.py\n mode: \"0600\"\n - file: docker-compose.yml\n mode: \"0600\"\n - file: nginx.conf\n mode: \"0600\"\n - file: redis.conf\n mode: \"0664\"\n register: awx_compose_config\n\n- name: Render SECRET_KEY file\n ansible.builtin.copy:\n content: \"{{ secret_key }}\"\n dest: \"{{ docker_compose_dir }}/SECRET_KEY\"\n mode: 384\n register: awx_secret_key\n\n- name: Remove AWX containers before migrating postgres so that the old postgres container does not get used\n community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: absent\n ignore_errors: true\n\n- name: Start the containers\n\n### Response:",
"predicted target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: present\n\n- name: Wait for AWX to start\n ansible.builtin.wait_for:\n port: 8052\n delay: 10\n\n- name: Migrate the database\n ansible.builtin.shell: docker exec -i awx_task awx-manage migrate --noinput\n register: migrate_result\n until: migrate"
],
"ref target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n restarted: \"{{ awx_compose_config is changed or awx_secret_key is changed }}\"\n register: awx_compose_start"
]
}
Without flash-attn:
{
"formatted input": "### Input: \n- name: Create {{ docker_compose_dir }} directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}\"\n state: directory\n\n- name: Create Redis socket directory\n ansible.builtin.file:\n path: \"{{ docker_compose_dir }}/redis_socket\"\n state: directory\n mode: 511\n\n- name: Create Docker Compose Configuration\n ansible.builtin.template:\n src: \"{{ item.file }}.j2\"\n dest: \"{{ docker_compose_dir }}/{{ item.file }}\"\n mode: \"{{ item.mode }}\"\n loop:\n - file: environment.sh\n mode: \"0600\"\n - file: credentials.py\n mode: \"0600\"\n - file: docker-compose.yml\n mode: \"0600\"\n - file: nginx.conf\n mode: \"0600\"\n - file: redis.conf\n mode: \"0664\"\n register: awx_compose_config\n\n- name: Render SECRET_KEY file\n ansible.builtin.copy:\n content: \"{{ secret_key }}\"\n dest: \"{{ docker_compose_dir }}/SECRET_KEY\"\n mode: 384\n register: awx_secret_key\n\n- name: Remove AWX containers before migrating postgres so that the old postgres container does not get used\n community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: absent\n ignore_errors: true\n\n- name: Start the containers\n\n### Response:",
"predicted target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n state: present\n\n- name: Wait for AWX to start\n ansible.builtin.wait_for:\n host: \"{{ awx_host }}\"\n port: 8052\n delay: 10\n\n- name: Migrate the database\n ansible.builtin.shell: docker exec -i awx_task awx-manage migrate --noinput"
],
"ref target": [
"community.docker.docker_compose:\n project_src: \"{{ docker_compose_dir }}\"\n restarted: \"{{ awx_compose_config is changed or awx_secret_key is changed }}\"\n register: awx_compose_start"
]
}
@anhuong @olson-ibm Thanks Anh and Joe! To reproduce what I have observed, I am attaching two files here: (1) an input json file (train_tiny.json) (2) my inference script (ansible_pt_inference_shared.py)
To run the script, first change two lines in the ansible_pt_inference_shared.py script: line 6 to point to your checkpoint (SFT flash_attn flag enabled) path and line 7 to point to your train_tiny.json
Then run "python ansible_pt_inference_shared.py", in my case, I see
output_str:
.,.,.,.,.,.,.,.,.,.,........,.,.,.,.,...,.,.....,..,...,......,:.: -.:.:.:.:.:.:.............,........,..........,.:..........,.........,::::::::::::::::::::::......:..............,.............,..............,.,..:.......:.........2.2......................
Thanks!
Wei ansible_pt_inference_shared.txt
P.S.: for some reason I cannot attach .py file, so I renamed ansible_pt_inference_shared.py to ansible_pt_inference_shared.txt, you would need to rename it back.
Also wanted to note some testing I did with different max sequence lengths:
This instance will be ignored in loss calculation.
and Could not find response key
, epoch=5 final loss=0.4658Thanks, Anh! But the problem that I was reporting , in my opinion, has less to do with training. When I ran inference (see ansible_pt_inference_shared.py that I had shared previously) without the attn flag against a flash attn model returns gibberish results.I wonder could you give it a try to run the inference script that I had attached against one of your flash attention enabled trained model and see if you have seen the same problem ? Thanks!
Official inference stack by platform is TGIS . We could not reproduce any such problem with TGIS . Hence there is no bug for platform to debug .
We give as ab example local inferencing using HF APIs in our tuning library , which is the official way from HF to infer models , we did not see any issue with that either !
So @weidotwisc i would suggest you to try the inference script in our repo and debug your script accordingly. While Anh can take a look at your script in a timeboxed manner , it is really not part of platform stack and not a bug for us to dig into .
The issue reported was that quality is poor - we have demonstrated quality is not poor using platform stack
@Ssukriti @anhuong Could you show me what is the right process to use "TGIS" to inference on one testing example ? I can try follow and see if I can get the desired result ?
Thanks! Wei
@anhuong You mentioned that "I then ran inference using the run_inference.py script with changes I have in a PR to run with or without flash-attention." --> what is right process to run it and where is the document to run it ?
Thanks!
Wei
@weidotwisc Are you able to access this issue that walks through running the model on TGIS?
This is what I ran for inference: To run without flash-attn:
# given file of input data
$ python scripts/run_inference.py --model <path-to-model> --text_file <path-to-text-file> --max_new_tokens 100
# single text input
$ python scripts/run_inference.py --model <path-to-model> --text "### Input: - name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry ### Response:" --max_new_tokens 100
I also ran the script with the training dataset you have attached and was not able to get the same garbled output you mentioned. Running against my prompt tuned model, this is the output I got:
[root@sft-trainer-test-anh-eval app]# python ansible_pt_inference_shared.py
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9/9 [2:32:08<00:00, 1014.23s/it]
- name: Ensure pip
community.general.easy_install:
name: pip
state: present
- name: Ensure docker for python
ansible.builtin.pip:
name: docker
version: 2.0.0
- name: Fetch registry certificate
ansible.builtin.fetch:
src: "{{ docker_storage_dir }}/certs/docker.crt"
dest: /tmp/docker_registry.crt
flat: true
run_once: true
delegate_to: "{{ groups['docker_registry'][0] }}"
- name: Create registry certificate directory
ansible.builtin.file:
path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}
state: directory
mode: 493
- name: Copy registry certificate file
ansible.builtin.copy:
src: /tmp/docker_registry.crt
dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt
- name: Enable service docker, restart to pickup registry certificate
ansible.builtin.systemd:
name: docker
enabled: true
state: restarted
- name: Login to the registry
input_len: 284
/usr/local/lib/python3.11/site-packages/peft/peft_model.py:1232: UserWarning: Position ids are not supported for parameter efficient tuning. Ignoring position ids.
warnings.warn("Position ids are not supported for parameter efficient tuning. Ignoring position ids.")
len_outputs: 540
output_str:
community.docker.docker_login:
registry: "{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}"
username: "{{ registry_username }}"
password: "{{ registry_password }}"
email: "{{ registry_email }}"
reauthorize: true
- name: Install packages
ansible.builtin.apt:
name: "{{ item }}"
state: present
with_items:
- git
- vim
- curl
- htop
- tree
- tmux
- python-software-properties
- software-properties-common
- build-essential
- libssl-dev
- libffi-dev
- python-dev
- python-pip
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-dev
- python-
You can see the input string printed out as well as the output string, which although has - python-dev
repeated at the end, is providing ansible formatted results.
@anhuong Thanks a lot for the help! I wonder when you ran ansible_pt_inference_shared.py
, did you run against a flash_attn enabled prompt tuned model ?
Thanks! Wei
Yes I ran against the same code model enabling flash_attn during tuning time and then did not use flash_attn via your script during inference time.
@anhuong I used the single text input
example that you listed by
model_path=/dccstor/weiz/bikinie/peft/pt/sft_debugging/fms-hf-tuning/pt_ckpts/checkpoint-5730
python scripts/run_inference.py --model $model_path --text "- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry" --max_new_tokens 100
Then I get the inference_result.json that looks like this:
[
{
"input": "- name: Ensure pip\\n community.general.easy_install:\\n name: pip\\n state: present\\n\\n- name: Ensure docker for python\\n ansible.builtin.pip:\\n name: docker\\n version: 2.0.0\\n\\n- name: Fetch registry certificate\\n ansible.builtin.fetch:\\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\\n dest: /tmp/docker_registry.crt\\n flat: true\\n run_once: true\\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\\n\\n- name: Create registry certificate directory\\n ansible.builtin.file:\\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\\n state: directory\\n mode: 493\\n\\n- name: Copy registry certificate file\\n ansible.builtin.copy:\\n src: /tmp/docker_registry.crt\\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\\n\\n- name: Enable service docker, restart to pickup registry certificate\\n ansible.builtin.systemd:\\n name: docker\\n enabled: true\\n state: restarted\\n\\n- name: Login to the registry",
"output": "- name: Ensure pip\\n community.general.easy_install:\\n name: pip\\n state: present\\n\\n- name: Ensure docker for python\\n ansible.builtin.pip:\\n name: docker\\n version: 2.0.0\\n\\n- name: Fetch registry certificate\\n ansible.builtin.fetch:\\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\\n dest: /tmp/docker_registry.crt\\n flat: true\\n run_once: true\\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\\n\\n- name: Create registry certificate directory\\n ansible.builtin.file:\\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\\n state: directory\\n mode: 493\\n\\n- name: Copy registry certificate file\\n ansible.builtin.copy:\\n src: /tmp/docker_registry.crt\\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\\n\\n- name: Enable service docker, restart to pickup registry certificate\\n ansible.builtin.systemd:\\n name: docker\\n enabled: true\\n state: restarted\\n\\n- name: Login to the registry. -... -.2.2. -.:.:.:.:.:.:.............,.,.,...,..,.,.,.,.,.,.,....,.,..,::::::::::::::::.:.:........."
}
]
If you scroll down to the end of the inference result, you can see my flash-attn-enabled PT-tuned model generates gibberish results.
I have two questions: (1) What was your PT training launching script that enabled flash attention ? (2) The model that I trained was on March 26, 2024 -- do you think there could be some changes between then and now in the PT training process that could have altered the results (i.e., how the model was checkpoint-ed in the end) ?
Thanks!
Wei
(1) What was your PT training launching script that enabled flash attention ?
The inference script that uses flash attention is on a branch here: https://github.com/anhuong/fms-hf-tuning/blob/flash-attn-inference/scripts/run_inference.py and requires adding flag --use_flash_attn
. This is currently in PR #132
I also updated my above comment as the input text used for inference should include the alpaca formatting with ### Input:
and ### Response:
, please try it with that.
(2) The model that I trained was on March 26, 2024 -- do you think there could be some changes between then and now in the PT training process that could have altered the results (i.e., how the model was checkpoint-ed in the end) ?
I'm unsure, this seems to be the most relevant hange but if the previous use_flash_attention_2 flag is not fully deprecated yet then the functionality should be the same.
@anhuong Thanks! I was asking "What was your PT training launching script that enabled flash attention ?" --> Training not inference script.
Thanks!
Wei
I used the script in this repo sft_trainer.py but triggered with the build scripts
@anhuong Thanks!
This is the script that I used to launch the PT training job
export MODEL_PATH=/dccstor/ai4code-ansible/shared/ckpt/granite-20b-code-all-yaml-2k/
export DATA_PATH=/dccstor/weiz/peft/train.json
export OUTPUT_PATH=pt_ckpts
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=.
python tuning/sft_trainer.py \
--model_name_or_path $MODEL_PATH \
--training_data_path $DATA_PATH \
--output_dir $OUTPUT_PATH \
--peft_method "pt" \
--tokenizer_name_or_path $MODEL_PATH \
--num_train_epochs 5 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--learning_rate 3e-2 \
--weight_decay 0. \
--warmup_ratio 0.0 \
--lr_scheduler_type "linear" \
--logging_steps 1 \
--include_tokens_per_second \
--packing False \
--response_template "\n### Response:" \
--dataset_text_field "output" \
--use_flash_attn True \
--torch_dtype "bfloat16" \
--num_virtual_tokens 100 \
--max_seq_length 512 \
--prompt_tuning_init RANDOM
as mentioned at the top of this issue thread.
I followed the README at https://github.com/foundation-model-stack/fms-hf-tuning#prompt-tuning- and modified accordingly (in particular to have flash attn enabled by --use_flash_attn True
).
I wonder how should I run your training script in a similar fashion ?
Thanks! Wei
As noted above, I ran with the same configurations you had set. This required creating the JSON config, setting env var SFT_TRAINER_CONFIG_JSON_PATH
to the config and then running python accelerate_launch.py
on a Pod with single-GPU.
By default when running this script, use_flash_attn is set to true. You can see the full set of params set:
INFO:root:Parameters used to launch training:
ModelArguments(model_name_or_path='granite-20b-code', use_flash_attn=True, torch_dtype=torch.bfloat16),
DataArguments(training_data_path='wisdom-train.json', response_template='\n### Response:', dataset_text_field='output', validation_data_path=None),
TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=True,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.03,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/runs/Apr24_18-02-41_sft-trainer-test-anh-wisdom-match-params,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1,
logging_strategy=IntervalStrategy.EPOCH,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_seq_length=512,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=5.0,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
optim_target_modules=None,
output_dir=granite-20b-code-tuned,
overwrite_output_dir=False,
packing=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=2,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=granite-20b-code,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=500,
save_strategy=IntervalStrategy.EPOCH,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
),
PromptTuningConfig(prompt_tuning_init='RANDOM', num_virtual_tokens=100, prompt_tuning_init_text='Classify if the tweet is a complaint or not:', tokenizer_name_or_path='granite-20b-code')
@anhuong
To reproduce the bug, this is what I did:
(1) checkout the latest fms-hf-tuning at commit c2f2f8c2179f38f1754d621d4bb13cd4c1eaca52 (Apr 29,2024)
(2) Run the training launching script, as documented in https://github.com/foundation-model-stack/fms-hf-tuning#prompt-tuning-, with the following modification (in particular, --use_flash_attn True
), the same as documented at the top of this issue thread:
export MODEL_PATH=/dccstor/ai4code-ansible/shared/ckpt/granite-20b-code-all-yaml-2k/
export DATA_PATH=/dccstor/weiz/irene/fms-hf-tuning/examples/prompt_tuning_ans/containers_infra-ent-train-fms.json
export OUTPUT_PATH=pt_ckpts
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=.
python tuning/sft_trainer.py \
--model_name_or_path $MODEL_PATH \
--training_data_path $DATA_PATH \
--output_dir $OUTPUT_PATH \
--peft_method "pt" \
--tokenizer_name_or_path $MODEL_PATH \
--num_train_epochs 5 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--learning_rate 3e-2 \
--weight_decay 0. \
--warmup_ratio 0.0 \
--lr_scheduler_type "linear" \
--logging_steps 1 \
--include_tokens_per_second \
--packing False \
--response_template "\n### Response:" \
--dataset_text_field "output" \
--use_flash_attn True \
--torch_dtype "bfloat16" \
--num_virtual_tokens 100 \
--max_seq_length 512 \
--prompt_tuning_init RANDOM
(3) Run the single text example inference script as you proposed, pointing to the checkpoint from step (2)
model_path=/dccstor/weiz/bikinie/peft/pt/sft_debugging/fms-hf-tuning/pt_ckpts/checkpoint-5730
python scripts/run_inference.py --model $model_path --text "- name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry" --max_new_tokens 100
The inference output (aka after the input repeating part) of the model, stored in inference_result.json (attached below) looks like this:
.0:::: '0: - '0: - 0: - 1: '0.0.0.0.0: '0: '0: '0: '0: - '0: - '0: - '0: - '0:0:0:0:0:0 -0:8000:1.0.0.0.00:0.0.0.0:0:00
Thanks!
Wei
@weidotwisc instead of me tuning on an old commit, you should tune the model using the latest repo main branch. In addition, I noted to try inference with the ### Input:
and ### Response:
alpaca formatting to see if this affects the prediction result. Please make these changes
@anhuong
(1) It is the latest commit that I tested against -- please note the commit hash!
(2) I have updated my inference script to be:
python scripts/run_inference.py --model $model_path --text "### Input: - name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry ### Response:" --max_new_tokens 100
I still get garbage towards the end see attached json file. inference_result.json
Thanks! Wei
Sorry for missing that it is the latest commit hash, thanks for tuning on latest commit. Sorry for being pedantic, but could you try with text "### Input: - name: Ensure pip\n community.general.easy_install:\n name: pip\n state: present\n\n- name: Ensure docker for python\n ansible.builtin.pip:\n name: docker\n version: 2.0.0\n\n- name: Fetch registry certificate\n ansible.builtin.fetch:\n src: \"{{ docker_storage_dir }}/certs/docker.crt\"\n dest: /tmp/docker_registry.crt\n flat: true\n run_once: true\n delegate_to: \"{{ groups['docker_registry'][0] }}\"\n\n- name: Create registry certificate directory\n ansible.builtin.file:\n path: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}\n state: directory\n mode: 493\n\n- name: Copy registry certificate file\n ansible.builtin.copy:\n src: /tmp/docker_registry.crt\n dest: /etc/docker/certs.d/{{ hostvars[groups['docker_registry'][0]].ansible_host }}:{{ registry_port }}/ca.crt\n\n- name: Enable service docker, restart to pickup registry certificate\n ansible.builtin.systemd:\n name: docker\n enabled: true\n state: restarted\n\n- name: Login to the registry\n\n### Response:" --max_new_tokens 100
I know this is a minor difference to include newlines \n\n### Response:
but I want to be sure the format is correct.
Also it looks like we tuned the same way so unless the base models we are using differently, not sure on why inference results are different. Also if you are trying to use flash attention with the inference script, make sure you are using my fork. You should not need to pass in True --use_flash_attn True
but only the flag --use_flash_attn
.
Finally, potentially since we are running on different devices with potentially different dependencies this could also be causing problems. Can you please let me know what versions of these dependencies you have: transformers, accelerate, peft, trl (all can get with pip show) and finally cuda (can get with /usr/local/cuda/bin/nvcc --version
)
I have versions:
peft: 0.10.0
transformers: 4.39.3
trl: 0.8.1
accelerate: 0.29.2
cuda:
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
@anhuong Thanks!
(1) The attached is the inference_result.json that uses the "\n\n" before "### Response:". It is still gibberish towards the end.
(2) I am not sure if I understand your comments about (i) your fork and (ii) pass the --use_flash_attn flag when using inference script. When I was using inference script, (i) I used scripts/run_inference.py in the main
branch (ii) I didn't pass in any flag, I was simply doing python scripts/run_inference.py --model $model_path --text ... --max_new_tokens 100
, where ...
is the text field.
(3) My environments are the following
peft: 0.8.2
transformers: 4.37.2
trl: 0.7.10
accelerate: 0.27.0
cuda: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
Thanks!
Wei
Thanks Wei, my change has been merged so you can now use the latest main branch and add flag --use_flash_attn
to see if that changes your results with your given model. You can run python scripts/run_inference.py --model $model_path --text ... --max_new_tokens 100 --use_flash_attn
.
I do wonder if the different dependency versions are making a difference in the tuning and inference since you are on slightly older dependency versions but on a newer cuda version than me. As we are not able to reproduce your results.
Thanks, Anh! Yeah, now it works :) The attached is the inference_result.json that appears normal in the end.
I see your solution is also to ask the user the pass a flag that indicates that is a flash attention enabled training model, then the inference code would load the model by including the option attn_implementation="flash_attention_2"
. This looks like the same solution that I proposed at https://github.com/foundation-model-stack/fms-hf-tuning/issues/103#issuecomment-2036003946
Now, two things are on my mind: (1) Maybe have folks like @fabianlim to have a look, as our solution indicates the SDPA (granite default attn mechanism) and Flash Attn are not equivalent mechanisms thus cannot be replaced transparently, at least for our granite models. And that seems to be contrary to the conventional wisdom, per Fabian ?
(2) If (1) is indeed the case, then the consumers of platform SFT trained models need to be aware of this, unless you have a control over how all the users are going to use the model. For example, in our group (AI4Ansible) 's evaluation pipeline, we didn't have this attn_implementation="flash_attention_2"
flag, because we didn't use flash attention mechanism to train our model. My feeling is the research teams that are using platform team's SFT pipeline would need to modify your inference code somehow to fit their existing scenario.
Thanks!
Wei
Describe the bug
Prompt Tuning model generates low-quality output
Platform
Please provide details about the environment you are using, including the following:
Sample Code
My launching script is
To repeat, one needs to define MODEL_PATH to the yaml-2k path (which is shared via COS bucket) and DATA_PATH to the training data
Expected behavior
Using the first training data item as the input to the model should show something close to the ground truth, i.e.,
Observed behavior
When running through inference, I got
Additional context
The SFT training loss has looked absolutely normal though.
I was able to adapt the HF PT tutorial at https://huggingface.co/docs/peft/main/en/task_guides/clm-prompt-tuning to run our task to get reasonable results.