Open RishitLunia3108 opened 1 week ago
Hi, inference after fine-tuning:
ohk thanks
@RishitLunia3108 Have you been able to fine-tune the model on your data set? Can you share with me the process? I am very grateful to you for that.
sure pls find the code below
!pwd !ls -R
!git clone https://github.com/modelscope/ms-swift.git %cd ms-swift
!pip install -e .[llm]
%cd ..
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Kaggle typically provides a single GPU
! pip install verovio
import pandas as pd import json
file_path = '/kaggle/input/hindi-ocr-synthetic-line-image-text-pair/data_80k/data.csv' df = pd.read_csv(file_path, encoding='utf-8')
import os import json
image_base_path = '/kaggle/input/hindi-ocr-synthetic-line-image-text-pair/data_80k/output_images/'
json_data = []
for index, row in df.iterrows():
full_image_path = os.path.join(image_base_path, row['image_file'])
json_obj = {
"query": "<image>Transcribe the text in this image", # Fixed query text
"response": row['text'], # Expected output from the CSV
"images": [full_image_path] # Full image path
}
json_data.append(json_obj)
json_output = json.dumps(json_data, indent=4, ensure_ascii=False)
output_file_path = '/kaggle/working/output_data.json' with open(output_file_path, 'w', encoding='utf-8') as f: f.write(json_output)
print("JSON file created successfully!")
!swift sft \ --model_type got-ocr2 \ --model_id_or_path stepfun-ai/GOT-OCR2_0 \ --sft_type lora \ --dataset /kaggle/working/output_data.json \ --output_dir /kaggle/working/hindi_got_model_3 \ --num_train_epochs 1 \ --max_steps 1000 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --learning_rate 2e-5 \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout 0.05 \ --evaluation_strategy steps \ --eval_steps 200 \ --save_strategy steps \ --save_steps 200
import os import json
image_base_path = '/kaggle/input/hindi-ocr-synthetic-line-image-text-pair/data_80k/TestSamples/'
json_data = []
for index, row in df.iterrows(): if index >= 10: break # Stop after processing 10 files
# Construct the full image path by combining the base path with the filename
full_image_path = os.path.join(image_base_path, row['image_file'])
json_obj = {
"query": "<image>Transcribe the text in this image", # Fixed query text
"response": row['text'], # Expected output from the CSV
"images": [full_image_path] # Full image path
}
json_data.append(json_obj)
json_output = json.dumps(json_data, indent=4, ensure_ascii=False)
output_file_path = '/kaggle/working/test1.json' with open(output_file_path, 'w', encoding='utf-8') as f: f.write(json_output)
print("JSON file created successfully!")
!CUDA_VISIBLE_DEVICES=0 swift infer\ --ckpt_dir /kaggle/working/hindi_got_model_3/got-ocr2/v0-20240930-060444/checkpoint-1000 \ --dataset /kaggle/working/test1.json \ --load_dataset_config true
@RishitLunia3108 Can you explain the data format below to me? I'm very grateful for that. {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}
1) so basically this data is used to finetune you GOT-OCR 2.0 model for different languages
2) the query contains the what do you want from the image and what type of data you are giving to the model
for example here we are give the model an image so the query section for all the json data will contain
=> "query": "
the history is the query and response of the previous image trained image_path contains the image path from which u want to extract the text from Currently i have not used the history part in my finetunning code
!swift sft \ --model_type got-ocr2 \ --model_id_or_path stepfun-ai/GOT-OCR2_0 \ --sft_type lora \ --dataset /kaggle/working/output_data.json \ --output_dir /kaggle/working/hindi_got_model_3 \ --num_train_epochs 1 \ --max_steps 100 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --learning_rate 2e-5 \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout 0.05 \ --evaluation_strategy steps \ --eval_steps 20 \ --save_strategy steps \ --save_steps 20
I used this to fine tune the model now pls guide how to load it and use it to extract text pls guide