Closed masudaryuto closed 6 months ago
Hi, if you only want to train on the REC task, you don't need to store annot_base64 and pts_string_interpolated since these are for segmentation masks to reduce the file size.
Thank you so much !!
Could you be more specific and tell me which parts should be corrected ?
Can I exclude "annot_base64 and pts_string_interpolated" ?
This is the code in "create_finetuning_data.py".
from refer.refer import REFER
import numpy as np
from PIL import Image
import random
import os
from tqdm import tqdm
import pickle
from poly_utils import is_clockwise, revert_direction, check_length, reorder_points, \
approximate_polygons, interpolate_polygons, image_to_base64, polygons_to_string
max_length = 400
data_root = './refer/data'
datasets = ['refcoco', 'refcoco+', 'refcocog']
image_dir = './datasets/images/mscoco/train2014'
val_test_files = pickle.load(open("data/val_test_files.p", "rb"))
combined_train_data = []
for dataset in datasets:
if dataset == 'refcoco':
splits = ['train', 'val', 'testA', 'testB']
splitBy = 'unc'
elif dataset == 'refcoco+':
splits = ['train', 'val', 'testA', 'testB']
splitBy = 'unc'
elif dataset == 'refcocog':
splits = ['train', 'val']
splitBy = 'umd'
save_dir = f'datasets/finetune/{dataset}'
os.makedirs(save_dir, exist_ok=True)
for split in splits:
num_pts = []
max_num_pts = 0
file_name = os.path.join(save_dir, f"{dataset}_{split}.tsv")
print("creating ", file_name)
uniq_ids = []
image_ids = []
sents = []
coeffs_strings = []
img_strings = []
writer = open(file_name, 'w')
refer = REFER(data_root, dataset, splitBy)
ref_ids = refer.getRefIds(split=split)
for this_ref_id in tqdm(ref_ids):
this_img_id = refer.getImgIds(this_ref_id)
this_img = refer.Imgs[this_img_id[0]]
fn = this_img['file_name']
img_id = fn.split(".")[0].split("_")[-1]
# load image
img = Image.open(os.path.join(image_dir, this_img['file_name'])).convert("RGB")
# convert image to string
img_base64 = image_to_base64(img, format='jpeg')
# load mask
ref = refer.loadRefs(this_ref_id)
ref_mask = np.array(refer.getMask(ref[0])['mask'])
annot = np.zeros(ref_mask.shape)
annot[ref_mask == 1] = 1 # 255
annot_img = Image.fromarray(annot.astype(np.uint8), mode="P")
annot_base64 = image_to_base64(annot_img, format='png')
polygons = refer.getPolygon(ref[0])['polygon']
polygons_processed = []
for polygon in polygons:
# make the polygon clockwise
if not is_clockwise(polygon):
polygon = revert_direction(polygon)
# reorder the polygon so that the first vertex is the one closest to image origin
polygon = reorder_points(polygon)
polygons_processed.append(polygon)
polygons = sorted(polygons_processed, key=lambda x: (x[0] ** 2 + x[1] ** 2, x[0], x[1]))
polygons_interpolated = interpolate_polygons(polygons)
polygons = approximate_polygons(polygons, 5, max_length)
pts_string = polygons_to_string(polygons)
pts_string_interpolated = polygons_to_string(polygons_interpolated)
# load box
box = refer.getRefBox(this_ref_id) # x,y,w,h
x, y, w, h = box
box_string = f'{x},{y},{x + w},{y + h}'
max_num_pts = max(max_num_pts, check_length(polygons))
num_pts.append(check_length(polygons))
# load text
ref_sent = refer.Refs[this_ref_id]
for i, (sent, sent_id) in enumerate(zip(ref_sent['sentences'], ref_sent['sent_ids'])):
uniq_id = f"{this_ref_id}_{i}"
instance = '\t'.join(
[uniq_id, str(this_img_id[0]), sent['sent'], box_string, pts_string, img_base64, annot_base64,
pts_string_interpolated]) + '\n'
writer.write(instance)
if img_id not in val_test_files and split == 'train': # filtered out val/test files
combined_train_data.append(instance)
writer.close()
random.shuffle(combined_train_data)
file_name = os.path.join("datasets/finetune/refcoco+g_train_shuffled.tsv")
print("creating ", file_name)
writer = open(file_name, 'w')
writer.writelines(combined_train_data)
writer.close()
You can remove the load mask part, but it is likely you will need to modify other parts of the codes such as data loading and training. I think the easiest way is to use the pretraining codes since it is the same task (REC) and generate data according to the pretraining format.
I can use the code "create_pretraining_data.py" and the data conversion is easy.
Thank you so much !!
After conversion, should I use "train_polyformer_b.sh" for finetuning the REC ?
Can you please tell me where I should change in "train_polyformer_b.sh"?
Thank you !!
This is 「train_polyformer_b.sh」.
#!/usr/bin/env
# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=6061
det_weight=0.1
cls_weight=0.0005
num_bins=64
log_dir=./polyformer_b_logs
save_dir=./polyformer_b_checkpoints
mkdir -p $log_dir $save_dir
bpe_dir=../../utils/BPE
user_dir=../../polyformer_module
data_dir=../../datasets/finetune
data=${data_dir}/refcoco+g_train_shuffled.tsv,${data_dir}/refcoco/refcoco_val.tsv
selected_cols=0,5,6,2,4,3,7
restore_file=../../weights/polyformer_b_pretrain.pt
task=refcoco
arch=polyformer_b
criterion=adjust_label_smoothed_cross_entropy
label_smoothing=0.1
lr=3e-5
max_epoch=5
warmup_ratio=0.06
batch_size=16
update_freq=8
resnet_drop_path_rate=0.0
encoder_drop_path_rate=0.1
decoder_drop_path_rate=0.1
dropout=0.1
attention_dropout=0.0
max_src_length=80
max_tgt_length=420
patch_image_size=512
for max_epoch in 100; do
echo "max_epoch "${max_epoch}
for lr in 5e-5; do
echo "lr "${lr}
for patch_image_size in 512; do
echo "patch_image_size "${patch_image_size}
log_file=${log_dir}/${max_epoch}"_"${lr}"_"${patch_image_size}".log"
save_path=${save_dir}/${max_epoch}"_"${lr}"_"${patch_image_size}
mkdir -p $save_path
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=${MASTER_PORT} ../../train.py \
$data \
--selected-cols=${selected_cols} \
--bpe-dir=${bpe_dir} \
--user-dir=${user_dir} \
--reset-optimizer --reset-dataloader --reset-meters \
--save-dir=${save_path} \
--task=${task} \
--arch=${arch} \
--criterion=${criterion} \
--label-smoothing=${label_smoothing} \
--batch-size=${batch_size} \
--update-freq=${update_freq} \
--encoder-normalize-before \
--restore-file=${restore_file} \
--decoder-normalize-before \
--share-decoder-input-output-embed \
--share-all-embeddings \
--layernorm-embedding \
--patch-layernorm-embedding \
--code-layernorm-embedding \
--resnet-drop-path-rate=${resnet_drop_path_rate} \
--encoder-drop-path-rate=${encoder_drop_path_rate} \
--decoder-drop-path-rate=${decoder_drop_path_rate} \
--dropout=${dropout} \
--attention-dropout=${attention_dropout} \
--weight-decay=0.01 --optimizer=adam --adam-betas="(0.9,0.999)" --adam-eps=1e-08 --clip-norm=1.0 \
--lr-scheduler=polynomial_decay --lr=${lr} \
--max-epoch=${max_epoch} --warmup-ratio=${warmup_ratio} \
--log-format=simple --log-interval=10 \
--fixed-validation-seed=7 \
--no-epoch-checkpoints --keep-best-checkpoints=1 \
--save-interval=1 --validate-interval=1 \
--save-interval-updates=500 --validate-interval-updates=500 \
--eval-acc \
--eval-args='{"beam":5,"min_len":2,"max_len_a":0,"max_len_b":2}' \
--best-checkpoint-metric=score --maximize-best-checkpoint-metric \
--max-src-length=${max_src_length} \
--max-tgt-length=${max_tgt_length} \
--find-unused-parameters \
--add-type-embedding \
--scale-attn \
--scale-fc \
--scale-heads \
--disable-entangle \
--num-bins=${num_bins} \
--patch-image-size=${patch_image_size} \
--fp16 \
--fp16-scale-window=512 \
--det_weight=${det_weight} \
--cls_weight=${cls_weight} \
--num-workers=0 > ${log_file} 2>&1
done
done
done
you should use pretrain_polyformer_b.sh for REC but add the restore_file argument which should point to your pretrained checkpoint
Thank you so much !!
Is there anything to add or change in "pre-train_polyformer_b.sh" other than "restore_file"? Is it ok to leave "task=refcoco_pretrain" as it is?
It should be ok to leave "task=refcoco_pretrain"
I understood. Thank you for sharing so many solutions !!
You are welcome!
Hello !!
Is there a way to evaluate only the pre-train REC model? I would like to evaluate only the REC task.
Sorry for repeating the question.
Thank you !!
Hi, you can try to use the current evaluation code and ignore the metrics for the segmentation task.
Thank you !!
Could you tell me what exactly you mean by ignoring the metrics for the segmentation task, and how exactly I should make the change?
This code is "evaluate_polyformer_b_refcoco.sh".
I would like to use REC pretrain checkpoint, not finetuning checkpoints.
However, when using the REC pretrain checkpoint, the task is "refcoco_pretrain", which cannot be evaluated well.
I would appreciate it if you could tell me.
#!/bin/bash
# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=6092
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export GPUS_PER_NODE=8
########################## Evaluate Refcoco+ ##########################
user_dir=../../polyformer_module
bpe_dir=../../utils/BPE
selected_cols=0,5,6,2,4,3
model='polyformer_b'
num_bins=64
batch_size=16
dataset='refcoco'
ckpt_path=../../weights/polyformer_b_refcoco.pt
for split in 'refcoco_val' 'refcoco_testA' 'refcoco_testB'
do
data=../../datasets/finetune/${dataset}/${split}.tsv
result_path=../../results_${model}/${dataset}/
vis_dir=${result_path}/vis/${split}
result_dir=${result_path}/result/${split}
python3 -m torch.distributed.launch --nproc_per_node=${GPUS_PER_NODE} --master_port=${MASTER_PORT} ../../evaluate.py \
${data} \
--path=${ckpt_path} \
--user-dir=${user_dir} \
--task=refcoco \
--batch-size=${batch_size} \
--log-format=simple --log-interval=10 \
--seed=7 \
--gen-subset=${split} \
--results-path=${result_path} \
--no-repeat-ngram-size=3 \
--fp16 \
--num-workers=0 \
--num-bins=${num_bins} \
--vis_dir=${vis_dir} \
--result_dir=${result_dir} \
--model-overrides="{\"data\":\"${data}\",\"bpe_dir\":\"${bpe_dir}\",\"selected_cols\":\"${selected_cols}\"}"
done
The following code shows the execution result when the checkpoint used is the REC pretrain checkpoint, not the finetuning checkpoint.
File "../../evaluate.py", line 158, in main
result, scores, f_scores, ap_scores, cum_I, cum_U = eval_step(task, generator, models, sample, **kwargs)
File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 214, in eval_step
raise NotImplementedError
NotImplementedError
This code is 「File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 214, in eval_step」
def eval_step(task, generator, models, sample, **kwargs):
if task.cfg._name == 'refcoco':
return eval_refcoco(task, generator, models, sample, **kwargs)
else:
raise NotImplementedError
Hi, you can try to modify utils/eval_utils.py from
if task.cfg._name == 'refcoco':
return eval_refcoco(task, generator, models, sample, **kwargs)
else:
raise NotImplementedError
to
return eval_refcoco(task, generator, models, sample, **kwargs)
and in the script change selected_cols
accordingly.
Hi! Thank you !!
I changed to
return eval_refcoco(task, generator, models, sample, **kwargs)
and
selected_cols=0,3,1,2
However, I got the following key error.
Traceback (most recent call last):
File "../../evaluate.py", line 192, in <module>
cli_main()
File "../../evaluate.py", line 187, in cli_main
vis_dir=args.vis_dir, vis=args.vis, result_dir=args.result_dir
File "/root/.asdf/installs/python/3.7.4/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 374, in call_main
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
File "/root/.asdf/installs/python/3.7.4/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 348, in distributed_main
main(cfg, **kwargs)
File "../../evaluate.py", line 158, in main
result, scores, f_scores, ap_scores, cum_I, cum_U = eval_step(task, generator, models, sample, **kwargs)
File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 212, in eval_step
return eval_refcoco(task, generator, models, sample, **kwargs)
File "/home/other_transformer_REC_model/polygon-transformer/utils/eval_utils.py", line 163, in eval_refcoco
gen_out_i_det[::2] *= sample['w'][i].cpu().numpy()
KeyError: 'w'
The checkpoint I am using is the REC's pretrain checkpoint. So, the task becomes refcoco_pretrain and "data/refcoco_pretrain_dataset.py" is executed.
If the task is refcoco and refcoco_dataset.py is executed, then the sample contains the 'w' key. However, I am using is the REC's pretrain checkpoint, so sample does not include the 'w' key.
I would like to use the REC's pretrain checkpoint to perform the evaluation.
How should this be corrected ?
I would appreciate it if you could tell me.
Hi, in that case you can add 'w' in the dataloader "w": w
I could evaluate the REC task at my pretrain checkpoint. Thank you!!
Hi, I made the modification as described above, but I ran into a new issue. I got the following key error.
Traceback (most recent call last): File "../../evaluate.py", line 185, in cli_main() File "../../evaluate.py", line 180, in cli_main vis_dir=args.vis_dir, vis=args.vis, result_dir=args.result_dir File "//mnt/sda/lf/polyformer/fairseq/fairseq/distributed/utils.py", line 372, in call_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "//mnt/sda/lf/polyformer/fairseq/fairseq/distributed/utils.py", line 346, in distributed_main main(cfg, kwargs) File "../../evaluate.py", line 151, in main result, scores, f_scores, ap_scores, cum_I, cum_U = eval_step(task, generator, models, sample, kwargs) File "/mnt/sda/lf/polyformer/utils/eval_utils.py", line 210, in eval_step return eval_refcoco(task, generator, models, sample, **kwargs) File "/mnt/sda/lf/polyformer/utils/eval_utils.py", line 187, in eval_refcoco gt = sample['label'] KeyError: 'label'
Hi !!
I did this by adding the key "label" to the example and batch in data/refcoco_pretrain_dataset.py. Also, I solved my problem by assigning additional keys to example and batch as well as data/refcoco_dataset.py.
Hello and thanks for your reply. I looked at your steps carefully, but how to define "label" in data/refcoco_pretrain_dataset.py. Does the validation set you use contain segmentation data or only detection data?
Hi! @joellliu @masudaryuto @LFUSST
Do you think it is possible to train with the RIS task on our dataset (~30k pairs of image-text, where each image contains more than one object segmentation)? Also, can the model be trained with a single GPU?
Thank you so much for your support!
When I run bash evaluate_polyformer_l_refcoco.sh, an error has occurred:
File "/data/zhou/polygon-transformer/bert/configuration_utils.py", line 201, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs) File "/data/zhou/polygon-transformer/bert/configuration_utils.py", line 252, in get_config_dict raise EnvironmentError(msg) OSError: Can't load config for 'bert-base-uncased'. Make sure that:
'bert-base-uncased' is a correct model identifier listed on 'https://huggingface.co/models'
or 'bert-base-uncased' is the correct path to a directory containing a config.json file
How should I solve this problem? Please help me, thank you. @vvuonghn @joellliu @masudaryuto @LFUSST @hyandell
Hello !!
I would like to do all training and evaluate the REC task on my dataset.
I pre-trained on my REC dataset.
Then, I want to do data conversion of my "refs(unc).p, instances.json" in "create_finetuning_data.py" for finetuning. However, due to "img_base64, annot_base64, pts_string_interpolated", the data becomes very large and data creation is difficult.
What was the size of the "refcoco+g_train_shuffled.tsv" file due to the conversion of the "RefCOCO, RefCOCO+, RefCOCOg" dataset ?
Also, is there a better conversion methods?
Thank you !!