Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
4.99k stars 407 forks source link

How to Load the fine tuned model #98

Open RishitLunia3108 opened 1 week ago

RishitLunia3108 commented 1 week ago

!swift sft \ --model_type got-ocr2 \ --model_id_or_path stepfun-ai/GOT-OCR2_0 \ --sft_type lora \ --dataset /kaggle/working/output_data.json \ --output_dir /kaggle/working/hindi_got_model_3 \ --num_train_epochs 1 \ --max_steps 100 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --learning_rate 2e-5 \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout 0.05 \ --evaluation_strategy steps \ --eval_steps 20 \ --save_strategy steps \ --save_steps 20

I used this to fine tune the model now pls guide how to load it and use it to extract text pls guide

Ucas-HaoranWei commented 1 week ago

Hi, inference after fine-tuning: image

RishitLunia3108 commented 1 week ago

ohk thanks

minhduc01168 commented 1 week ago

@RishitLunia3108 Have you been able to fine-tune the model on your data set? Can you share with me the process? I am very grateful to you for that.

RishitLunia3108 commented 2 days ago

sure pls find the code below

%%

!pwd !ls -R

%%

!git clone https://github.com/modelscope/ms-swift.git %cd ms-swift

%%

!pip install -e .[llm]

%%

%cd ..

%%

import os

Set environment variables

os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Kaggle typically provides a single GPU

%%

! pip install verovio

%%

import pandas as pd import json

Load the CSV data

file_path = '/kaggle/input/hindi-ocr-synthetic-line-image-text-pair/data_80k/data.csv' df = pd.read_csv(file_path, encoding='utf-8')

%%

import os import json

Base path for images

image_base_path = '/kaggle/input/hindi-ocr-synthetic-line-image-text-pair/data_80k/output_images/'

json_data = []

Loop through the CSV data and create JSON objects

for index, row in df.iterrows():

Construct the full image path by combining the base path with the filename

full_image_path = os.path.join(image_base_path, row['image_file'])

json_obj = {
    "query": "<image>Transcribe the text in this image",  # Fixed query text
    "response": row['text'],                             # Expected output from the CSV
    "images": [full_image_path]                          # Full image path
}
json_data.append(json_obj)

Convert the list of dictionaries to JSON format

json_output = json.dumps(json_data, indent=4, ensure_ascii=False)

Save to a JSON file (optional)

output_file_path = '/kaggle/working/output_data.json' with open(output_file_path, 'w', encoding='utf-8') as f: f.write(json_output)

print("JSON file created successfully!")

%%

!swift sft \ --model_type got-ocr2 \ --model_id_or_path stepfun-ai/GOT-OCR2_0 \ --sft_type lora \ --dataset /kaggle/working/output_data.json \ --output_dir /kaggle/working/hindi_got_model_3 \ --num_train_epochs 1 \ --max_steps 1000 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --learning_rate 2e-5 \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout 0.05 \ --evaluation_strategy steps \ --eval_steps 200 \ --save_strategy steps \ --save_steps 200

%%

import os import json

Base path for images

image_base_path = '/kaggle/input/hindi-ocr-synthetic-line-image-text-pair/data_80k/TestSamples/'

json_data = []

Loop through the CSV data and create JSON objects (limit to 10 files)

for index, row in df.iterrows(): if index >= 10: break # Stop after processing 10 files

# Construct the full image path by combining the base path with the filename
full_image_path = os.path.join(image_base_path, row['image_file'])

json_obj = {
    "query": "<image>Transcribe the text in this image",  # Fixed query text
    "response": row['text'],                             # Expected output from the CSV
    "images": [full_image_path]                          # Full image path
}
json_data.append(json_obj)

Convert the list of dictionaries to JSON format

json_output = json.dumps(json_data, indent=4, ensure_ascii=False)

Save to a JSON file (optional)

output_file_path = '/kaggle/working/test1.json' with open(output_file_path, 'w', encoding='utf-8') as f: f.write(json_output)

print("JSON file created successfully!")

%%

!CUDA_VISIBLE_DEVICES=0 swift infer\ --ckpt_dir /kaggle/working/hindi_got_model_3/got-ocr2/v0-20240930-060444/checkpoint-1000 \ --dataset /kaggle/working/test1.json \ --load_dataset_config true

minhduc01168 commented 2 days ago

@RishitLunia3108 Can you explain the data format below to me? I'm very grateful for that. {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}

RishitLunia3108 commented 20 hours ago

1) so basically this data is used to finetune you GOT-OCR 2.0 model for different languages 2) the query contains the what do you want from the image and what type of data you are giving to the model for example here we are give the model an image so the query section for all the json data will contain => "query": "Transcribe the text in this image" 3) the response contains the expected text which you will be expecting after extracting from the image this is done in the case of training , in the case if testing it will be empty as the model will itself fill the response with the extracted text The format of JSON during training { "query": "\Transcribe the text in this image", "response": "गर्भनिरोध के लिए महिलाएं क्यों कराती हैं नसबंदी", "images": [ "path_to_image" ] } The format of JSON during testing { "query": "\Transcribe the text in this image", "response": "", "images": [ "path_to_image" ] }

the history is the query and response of the previous image trained image_path contains the image path from which u want to extract the text from Currently i have not used the history part in my finetunning code