Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices
Apache License 2.0
996 stars 66 forks source link

Problem of Finetune #31

Closed QvQKing closed 6 months ago

QvQKing commented 7 months ago

Formatting inputs...Skip in lazy mode Rank: 0 partition count [1, 1] and sizes[(1383208960, False), (25600, False)] 0%| | 0/6542 [00:00<?, ?it/s]Traceback (most recent call last): File "/hy-tmp/MobileVLM-main/mobilevlm/train/train_mem.py", line 13, in train() File "/hy-tmp/MobileVLM-main/mobilevlm/train/train.py", line 893, in train trainer.train() File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train return inner_training_loop( File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/trainer.py", line 1813, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/accelerate/data_loader.py", line 381, in iter dataloader_iter = super().iter() File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 441, in iter return self._get_iterator() File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1084, in init self._reset(loader, first_iter=True) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1117, in _reset self._try_put_index() File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1351, in _try_put_index index = self._next_index() File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 623, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 254, in iter for idx in self.sampler: File "/hy-tmp/MobileVLM-main/mobilevlm/train/trainer.py", line 106, in iter indices = get_modality_length_grouped_indices(self.lengths, self.batch_size, self.world_size, generator=self.generator) File "/hy-tmp/MobileVLM-main/mobilevlm/train/trainer.py", line 39, in get_modality_length_grouped_indices lang_indices, lang_lengths = zip(*[(i, -l) for i, l in enumerate(lengths) if l < 0]) ValueError: not enough values to unpack (expected 2, got 0) 0%| | 0/6542 [00:01<?, ?it/s] [2024-02-27 03:08:32,658] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 6092 [2024-02-27 03:08:32,658] [ERROR] [launch.py:321:sigkill_handler] ['/usr/local/miniconda3/envs/mobilevlm/bin/python', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=0', '--deepspeed', 'scripts/deepspeed/zero2.json', '--model_name_or_path', './mtgv/MobileVLM_V2-1.7B', '--version', 'v1', '--data_path', 'data/eccv_train.json', '--image_folder', 'data/eccv_train', '--vision_tower', './mtgv/clip-vit-large-patch14-336', '--vision_tower_type', 'clip', '--pretrain_mm_mlp_adapter', './finetune-results/mobilevlm-1.pretrain/mm_projector.bin', '--mm_projector_type', 'ldpnet', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'True', '--output_dir', './finetune-results/mobilevlm-2.finetune', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '50000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1 Done. I want to know how to solve it, thanks!

weifei7 commented 7 months ago

Please refer to issue #11

er-muyue commented 6 months ago

Hi, we are closing this issue due to the inactivity. Hope your question has been resolved. If you have any further concerns, please feel free to re-open it or open a new issue. Thanks!