amazon-science / polygon-transformer

Apache License 2.0
131 stars 9 forks source link

learning with my datasets #15

Closed masudaryuto closed 6 months ago

masudaryuto commented 1 year ago

Hello !!

I would like to do all training and evaluate the REC task on my dataset.

I pre-trained on my REC dataset.

Then, I want to do data conversion of my "refs(unc).p, instances.json" in "create_finetuning_data.py" for finetuning. However, due to "img_base64, annot_base64, pts_string_interpolated", the data becomes very large and data creation is difficult.

What was the size of the "refcoco+g_train_shuffled.tsv" file due to the conversion of the "RefCOCO, RefCOCO+, RefCOCOg" dataset ?

Also, is there a better conversion methods?

Thank you !!

joellliu commented 1 year ago

Hi, if you only want to train on the REC task, you don't need to store annot_base64 and pts_string_interpolated since these are for segmentation masks to reduce the file size.

masudaryuto commented 1 year ago

Thank you so much !!

Could you be more specific and tell me which parts should be corrected ?

Can I exclude "annot_base64 and pts_string_interpolated" ?

This is the code in "create_finetuning_data.py".

from refer.refer import REFER
import numpy as np
from PIL import Image
import random
import os
from tqdm import tqdm

import pickle
from poly_utils import is_clockwise, revert_direction, check_length, reorder_points, \
    approximate_polygons, interpolate_polygons, image_to_base64, polygons_to_string

max_length = 400

data_root = './refer/data'
datasets = ['refcoco', 'refcoco+', 'refcocog']

image_dir = './datasets/images/mscoco/train2014'
val_test_files = pickle.load(open("data/val_test_files.p", "rb"))

combined_train_data = []

for dataset in datasets:
    if dataset == 'refcoco':
        splits = ['train', 'val', 'testA', 'testB']
        splitBy = 'unc'
    elif dataset == 'refcoco+':
        splits = ['train', 'val', 'testA', 'testB']
        splitBy = 'unc'
    elif dataset == 'refcocog':
        splits = ['train', 'val']
        splitBy = 'umd'

    save_dir = f'datasets/finetune/{dataset}'
    os.makedirs(save_dir, exist_ok=True)
    for split in splits:
        num_pts = []
        max_num_pts = 0
        file_name = os.path.join(save_dir, f"{dataset}_{split}.tsv")
        print("creating ", file_name)

        uniq_ids = []
        image_ids = []
        sents = []
        coeffs_strings = []
        img_strings = []

        writer = open(file_name, 'w')
        refer = REFER(data_root, dataset, splitBy)

        ref_ids = refer.getRefIds(split=split)

        for this_ref_id in tqdm(ref_ids):
            this_img_id = refer.getImgIds(this_ref_id)
            this_img = refer.Imgs[this_img_id[0]]
            fn = this_img['file_name']
            img_id = fn.split(".")[0].split("_")[-1]

            # load image
            img = Image.open(os.path.join(image_dir, this_img['file_name'])).convert("RGB")

            # convert image to string
            img_base64 = image_to_base64(img, format='jpeg')

            # load mask
            ref = refer.loadRefs(this_ref_id)
            ref_mask = np.array(refer.getMask(ref[0])['mask'])
            annot = np.zeros(ref_mask.shape)
            annot[ref_mask == 1] = 1  # 255
            annot_img = Image.fromarray(annot.astype(np.uint8), mode="P")
            annot_base64 = image_to_base64(annot_img, format='png')

            polygons = refer.getPolygon(ref[0])['polygon']

            polygons_processed = []
            for polygon in polygons:
                # make the polygon clockwise
                if not is_clockwise(polygon):
                    polygon = revert_direction(polygon)

                # reorder the polygon so that the first vertex is the one closest to image origin
                polygon = reorder_points(polygon)
                polygons_processed.append(polygon)

            polygons = sorted(polygons_processed, key=lambda x: (x[0] ** 2 + x[1] ** 2, x[0], x[1]))
            polygons_interpolated = interpolate_polygons(polygons)

            polygons = approximate_polygons(polygons, 5, max_length)

            pts_string = polygons_to_string(polygons)
            pts_string_interpolated = polygons_to_string(polygons_interpolated)

            # load box
            box = refer.getRefBox(this_ref_id)  # x,y,w,h
            x, y, w, h = box
            box_string = f'{x},{y},{x + w},{y + h}'

            max_num_pts = max(max_num_pts, check_length(polygons))

            num_pts.append(check_length(polygons))
            # load text
            ref_sent = refer.Refs[this_ref_id]
            for i, (sent, sent_id) in enumerate(zip(ref_sent['sentences'], ref_sent['sent_ids'])):
                uniq_id = f"{this_ref_id}_{i}"
                instance = '\t'.join(
                    [uniq_id, str(this_img_id[0]), sent['sent'], box_string, pts_string, img_base64, annot_base64,
                     pts_string_interpolated]) + '\n'
                writer.write(instance)

                if img_id not in val_test_files and split == 'train':  # filtered out val/test files
                    combined_train_data.append(instance)
        writer.close()

random.shuffle(combined_train_data)
file_name = os.path.join("datasets/finetune/refcoco+g_train_shuffled.tsv")
print("creating ", file_name)
writer = open(file_name, 'w')
writer.writelines(combined_train_data)
writer.close()
joellliu commented 1 year ago

You can remove the load mask part, but it is likely you will need to modify other parts of the codes such as data loading and training. I think the easiest way is to use the pretraining codes since it is the same task (REC) and generate data according to the pretraining format.

masudaryuto commented 1 year ago

I can use the code "create_pretraining_data.py" and the data conversion is easy.

Thank you so much !!

After conversion, should I use "train_polyformer_b.sh" for finetuning the REC ?

Can you please tell me where I should change in "train_polyformer_b.sh"?

Thank you !!

This is 「train_polyformer_b.sh」.

#!/usr/bin/env

# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=6061

det_weight=0.1
cls_weight=0.0005
num_bins=64
log_dir=./polyformer_b_logs
save_dir=./polyformer_b_checkpoints
mkdir -p $log_dir $save_dir

bpe_dir=../../utils/BPE
user_dir=../../polyformer_module

data_dir=../../datasets/finetune
data=${data_dir}/refcoco+g_train_shuffled.tsv,${data_dir}/refcoco/refcoco_val.tsv
selected_cols=0,5,6,2,4,3,7
restore_file=../../weights/polyformer_b_pretrain.pt

task=refcoco
arch=polyformer_b
criterion=adjust_label_smoothed_cross_entropy
label_smoothing=0.1
lr=3e-5
max_epoch=5
warmup_ratio=0.06
batch_size=16
update_freq=8
resnet_drop_path_rate=0.0
encoder_drop_path_rate=0.1
decoder_drop_path_rate=0.1
dropout=0.1
attention_dropout=0.0
max_src_length=80
max_tgt_length=420

patch_image_size=512

for max_epoch in 100; do
  echo "max_epoch "${max_epoch}
  for lr in 5e-5; do
    echo "lr "${lr}
    for patch_image_size in 512; do
      echo "patch_image_size "${patch_image_size}

      log_file=${log_dir}/${max_epoch}"_"${lr}"_"${patch_image_size}".log"
      save_path=${save_dir}/${max_epoch}"_"${lr}"_"${patch_image_size}
      mkdir -p $save_path

      CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=${MASTER_PORT} ../../train.py \
          $data \
          --selected-cols=${selected_cols} \
          --bpe-dir=${bpe_dir} \
          --user-dir=${user_dir} \
          --reset-optimizer --reset-dataloader --reset-meters \
          --save-dir=${save_path} \
          --task=${task} \
          --arch=${arch} \
          --criterion=${criterion} \
          --label-smoothing=${label_smoothing} \
          --batch-size=${batch_size} \
          --update-freq=${update_freq} \
          --encoder-normalize-before \
          --restore-file=${restore_file} \
          --decoder-normalize-before \
          --share-decoder-input-output-embed \
          --share-all-embeddings \
          --layernorm-embedding \
          --patch-layernorm-embedding \
          --code-layernorm-embedding \
          --resnet-drop-path-rate=${resnet_drop_path_rate} \
          --encoder-drop-path-rate=${encoder_drop_path_rate} \
          --decoder-drop-path-rate=${decoder_drop_path_rate} \
          --dropout=${dropout} \
          --attention-dropout=${attention_dropout} \
          --weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0 \
          --lr-scheduler=polynomial_decay --lr=${lr} \
          --max-epoch=${max_epoch} --warmup-ratio=${warmup_ratio} \
          --log-format=simple --log-interval=10 \
          --fixed-validation-seed=7 \
          --no-epoch-checkpoints --keep-best-checkpoints=1 \
          --save-interval=1 --validate-interval=1 \
          --save-interval-updates=500 --validate-interval-updates=500 \
          --eval-acc \
          --eval-args='{"beam":5,"min_len":2,"max_len_a":0,"max_len_b":2}' \
          --best-checkpoint-metric=score --maximize-best-checkpoint-metric \
          --max-src-length=${max_src_length} \
          --max-tgt-length=${max_tgt_length} \
          --find-unused-parameters \
          --add-type-embedding \
          --scale-attn \
          --scale-fc \
          --scale-heads \
          --disable-entangle \
          --num-bins=${num_bins} \
          --patch-image-size=${patch_image_size} \
          --fp16 \
          --fp16-scale-window=512 \
          --det_weight=${det_weight} \
          --cls_weight=${cls_weight} \
          --num-workers=0 > ${log_file} 2>&1
    done
  done
done
joellliu commented 1 year ago

you should use pretrain_polyformer_b.sh for REC but add the restore_file argument which should point to your pretrained checkpoint

masudaryuto commented 1 year ago

Thank you so much !!

Is there anything to add or change in "pre-train_polyformer_b.sh" other than "restore_file"? Is it ok to leave "task=refcoco_pretrain" as it is?

joellliu commented 1 year ago

It should be ok to leave "task=refcoco_pretrain"

masudaryuto commented 1 year ago

I understood. Thank you for sharing so many solutions !!

joellliu commented 1 year ago

You are welcome!

masudaryuto commented 1 year ago

Hello !!

Is there a way to evaluate only the pre-train REC model? I would like to evaluate only the REC task.

Sorry for repeating the question.

Thank you !!

joellliu commented 1 year ago

Hi, you can try to use the current evaluation code and ignore the metrics for the segmentation task.

masudaryuto commented 1 year ago

Thank you !!

Could you tell me what exactly you mean by ignoring the metrics for the segmentation task, and how exactly I should make the change?

This code is "evaluate_polyformer_b_refcoco.sh".

I would like to use REC pretrain checkpoint, not finetuning checkpoints.

However, when using the REC pretrain checkpoint, the task is "refcoco_pretrain", which cannot be evaluated well.

I would appreciate it if you could tell me.

#!/bin/bash

# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=6092
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export GPUS_PER_NODE=8

########################## Evaluate Refcoco+ ##########################
user_dir=../../polyformer_module
bpe_dir=../../utils/BPE
selected_cols=0,5,6,2,4,3

model='polyformer_b'
num_bins=64
batch_size=16

dataset='refcoco'
ckpt_path=../../weights/polyformer_b_refcoco.pt

for split in 'refcoco_val' 'refcoco_testA' 'refcoco_testB'
do
data=../../datasets/finetune/${dataset}/${split}.tsv
result_path=../../results_${model}/${dataset}/
vis_dir=${result_path}/vis/${split}
result_dir=${result_path}/result/${split}
python3 -m torch.distributed.launch --nproc_per_node=${GPUS_PER_NODE} --master_port=${MASTER_PORT} ../../evaluate.py \
    ${data} \
    --path=${ckpt_path} \
    --user-dir=${user_dir} \
    --task=refcoco \
    --batch-size=${batch_size} \
    --log-format=simple --log-interval=10 \
    --seed=7 \
    --gen-subset=${split} \
    --results-path=${result_path} \
    --no-repeat-ngram-size=3 \
    --fp16 \
    --num-workers=0 \
    --num-bins=${num_bins} \
    --vis_dir=${vis_dir} \
    --result_dir=${result_dir} \
    --model-overrides="{\"data\":\"${data}\",\"bpe_dir\":\"${bpe_dir}\",\"selected_cols\":\"${selected_cols}\"}"
done

The following code shows the execution result when the checkpoint used is the REC pretrain checkpoint, not the finetuning checkpoint.

File "../../evaluate.py", line 158, in main
    result, scores, f_scores, ap_scores, cum_I, cum_U = eval_step(task, generator, models, sample, **kwargs)
  File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 214, in eval_step
    raise NotImplementedError
NotImplementedError

This code is 「File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 214, in eval_step」

def eval_step(task, generator, models, sample, **kwargs):

    if task.cfg._name == 'refcoco':
        return eval_refcoco(task, generator, models, sample, **kwargs)
    else:
        raise NotImplementedError
joellliu commented 1 year ago

Hi, you can try to modify utils/eval_utils.py from

if task.cfg._name == 'refcoco':
    return eval_refcoco(task, generator, models, sample, **kwargs)
else:
    raise NotImplementedError

to

 return eval_refcoco(task, generator, models, sample, **kwargs)

and in the script change selected_cols accordingly.

masudaryuto commented 1 year ago

Hi! Thank you !!

I changed to

 return eval_refcoco(task, generator, models, sample, **kwargs)

and

selected_cols=0,3,1,2

However, I got the following key error.

Traceback (most recent call last):
  File "../../evaluate.py", line 192, in <module>
    cli_main()
  File "../../evaluate.py", line 187, in cli_main
    vis_dir=args.vis_dir, vis=args.vis, result_dir=args.result_dir
  File "/root/.asdf/installs/python/3.7.4/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 374, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/root/.asdf/installs/python/3.7.4/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 348, in distributed_main
    main(cfg, **kwargs)
  File "../../evaluate.py", line 158, in main
    result, scores, f_scores, ap_scores, cum_I, cum_U = eval_step(task, generator, models, sample, **kwargs)
  File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 212, in eval_step
    return eval_refcoco(task, generator, models, sample, **kwargs)
  File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 163, in eval_refcoco
    gen_out_i_det[::2] *= sample['w'][i].cpu().numpy()
KeyError: 'w'

The checkpoint I am using is the REC's pretrain checkpoint. So, the task becomes refcoco_pretrain and "data/refcoco_pretrain_dataset.py" is executed.

If the task is refcoco and refcoco_dataset.py is executed, then the sample contains the 'w' key. However, I am using is the REC's pretrain checkpoint, so sample does not include the 'w' key.

I would like to use the REC's pretrain checkpoint to perform the evaluation.

How should this be corrected ?

I would appreciate it if you could tell me.

joellliu commented 1 year ago

Hi, in that case you can add 'w' in the dataloader "w": w

https://github.com/amazon-science/polygon-transformer/blob/69fc728b2ec6a2b3595ec34db64074badcb19151/data/refcoco_pretrain_dataset.py#L148C43-L148C43

masudaryuto commented 1 year ago

I could evaluate the REC task at my pretrain checkpoint. Thank you!!

LFUSST commented 10 months ago

Hi, I made the modification as described above, but I ran into a new issue. I got the following key error.

Traceback (most recent call last): File "../../evaluate.py", line 185, in cli_main() File "../../evaluate.py", line 180, in cli_main vis_dir=args.vis_dir, vis=args.vis, result_dir=args.result_dir File "//mnt/sda/lf/polyformer/fairseq/fairseq/distributed/utils.py", line 372, in call_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "//mnt/sda/lf/polyformer/fairseq/fairseq/distributed/utils.py", line 346, in distributed_main main(cfg, kwargs) File "../../evaluate.py", line 151, in main result, scores, f_scores, ap_scores, cum_I, cum_U = eval_step(task, generator, models, sample, kwargs) File "/mnt/sda/lf/polyformer/utils/eval_utils.py", line 210, in eval_step return eval_refcoco(task, generator, models, sample, **kwargs) File "/mnt/sda/lf/polyformer/utils/eval_utils.py", line 187, in eval_refcoco gt = sample['label'] KeyError: 'label'

masudaryuto commented 10 months ago

Hi !!

I did this by adding the key "label" to the example and batch in data/refcoco_pretrain_dataset.py. Also, I solved my problem by assigning additional keys to example and batch as well as data/refcoco_dataset.py.

LFUSST commented 10 months ago

Hello and thanks for your reply. I looked at your steps carefully, but how to define "label" in data/refcoco_pretrain_dataset.py. Does the validation set you use contain segmentation data or only detection data?

vvuonghn commented 10 months ago

Hi! @joellliu @masudaryuto @LFUSST

Do you think it is possible to train with the RIS task on our dataset (~30k pairs of image-text, where each image contains more than one object segmentation)? Also, can the model be trained with a single GPU?

Thank you so much for your support!

ustczhouyu commented 9 months ago

When I run bash evaluate_polyformer_l_refcoco.sh, an error has occurred:

File "/data/zhou/polygon-transformer/bert/configuration_utils.py", line 201, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs) File "/data/zhou/polygon-transformer/bert/configuration_utils.py", line 252, in get_config_dict raise EnvironmentError(msg) OSError: Can't load config for 'bert-base-uncased'. Make sure that:

'bert-base-uncased' is a correct model identifier listed on 'https://huggingface.co/models'

or 'bert-base-uncased' is the correct path to a directory containing a config.json file

How should I solve this problem? Please help me, thank you. @vvuonghn @joellliu @masudaryuto @LFUSST @hyandell