Open machoangha opened 1 week ago
Hi,
I am attempting to fine-tune a model with Vietnamese characters using the configuration provided below. I updated
DICT90
to include 219 characters, as follows:DICT90 = tuple('AÁÀẠẢÃĂẮẰẲẶẴÂẤẦẨẪẬBCDĐEÈÉẼẺẸÊẾỀỂỄỆFGHIÍÌỈĨỊJKLMNOPQRSTUVWXYZ' 'aáàạảãăằắẳẵặâấầẩẫậbcdđeèéẻẽẹêếềểễệfghiíìỉĩịjklmnopqrstuvwxyz' '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]_~') # 219 characters
Here is my
CCD_vision_model_ARD.yaml
configuration:global: name: finetune_small_65536_1 phase: train stage: train-supervised workdir: workdir seed: ~ output_dir: './saved_models/' dataset: scheme: supervised type: ST train: { roots: [ './Dino/data_lmdb/training', ], batch_size: 8, } valid: { roots: [ './Dino/data_lmdb/validation' ], batch_size: 8 } test: { roots: [ './Dino/data_lmdb/evaluation' ], batch_size: 14 } data_aug: True multiscales: False mask: False num_workers: 6 augmentation_severity: 0 charset_path: './Dino/data/charset_vi.txt' # Vietnamese charset charset_type: 'DICT90' # Changed to Vietnamese charset in base.py training: epochs: 20 start_iters: 0 show_iters: 1000 eval_iters: 1000 save_iters: 1000 model: pretrain_checkpoint: 'saved_models/Small_ARD_checkpoint.pth' checkpoint: decoder: type: 'NRTRDecoder' n_layers: 6 d_embedding: 512 n_head: 8 d_model: 512 d_inner: 256 d_k: 64 d_v: 64 num_classes: 221 # 219 + 2 max_seq_len: 25 start_idx: 220 # 219 + 1 padding_idx: 221 # 219 + 2 mp: num: 4 arch: 'vit_small' patch_size: 4 out_dim: 65536 weight_decay: 0.05 clip_grad: ~ lr: 0.0005 warmup_epochs: 2 min_lr: 0.000001 optimizer: adamw drop_path_rate: 0.1 seed: 0 num_workers: 8
After running finetune.py, the model's accuracy is only 0. I am unsure if there is an error in the configuration or if something else might be wrong.
Could you please help me identify if there is any mistake in the setup or configuration? Any guidance or suggestions would be greatly appreciated.
Thanks.
I think you should pay more attention to the details of the evaluation. For example: https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/Dino/metric/eval_acc.py#L38
eval model
iteration:1000--> train loss:2.2958712577819824
eval model
iteration:2000--> train loss:1.863906741142273
eval model
iteration:3000--> train loss:1.80726158618927
eval model
iteration:4000--> train loss:1.7889494895935059
eval model
iteration:5000--> train loss:1.7705328464508057
eval model
This is after 12 epochs training, the result only predicts 1 word for every test case
Hi,
I am attempting to fine-tune a model with Vietnamese characters using the configuration provided below. I updated
DICT90
to include 219 characters, as follows:Here is my
CCD_vision_model_ARD.yaml
configuration:After running finetune.py, the model's accuracy is only 0. I am unsure if there is an error in the configuration or if something else might be wrong.
Could you please help me identify if there is any mistake in the setup or configuration? Any guidance or suggestions would be greatly appreciated.
Thanks.