Open shiym2000 opened 3 weeks ago
Hi,
I appreciate your interest in our work. In the segmentation task, you need to preprocess the segmentation data, including two steps. First, we should preprocess the nii.gz or other format to save to .npy, and generate a JSON split file, like here. Please note that there may not be the same in different segmentation datasets.
Second, we should preprocess these NPY images and mask to a predefined format, like here. In this step, we should resize, separate the mask channel, and generate a JSON file.
Then, we can use these preprocessed datasets in our training. When we read a pair of data from the JSON file, we can access a 3D NPY image and a One-channel mask.
If you have any questions, we are glad to reply.
Thank you for your reply. Furthermore, if I want to use the fintuned model to finetune on my own dataset, what should I do to modify the finetune_lora.sh
and train.py
?
Hi,
You should give --pretrain_mllm
in finetune_lora.sh. It would be better to transfer our safetensor model to bin model. Additionally, you should not set any ----pretrain_mm_mlp_adapter
.
Then, you can continue to finetune this model from ours.
What about ----model_name_or_path
, --pretrain_vision_model
and --pretrain_seg_module
? Can you give an example of the new finetune_lora.sh
? Thank you very much.
Hi,
please try this.
accelerate launch LaMed/src/train/train.py \
--version v0 \
--model_name_or_path microsoft/Phi-3-mini-4k-instruct \
--model_type phi3 \
--lora_enable True \
--vision_tower vit3d \
--pretrain_vision_model ./LaMed/pretrained_model/M3D-CLIP/pretrained_ViT.bin \
--segmentation_module segvol \
--pretrain_seg_module ./LaMed/pretrained_model/SegVol/pytorch_model.bin \
--pretrain_mllm ./LaMed/output/LaMed-Phi3-4B/model.bin \
--bf16 True \
--output_dir ./LaMed/output/LaMed-Phi3-4B-finetune \
--num_train_epochs 5 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "steps" \
--eval_accumulation_steps 1 \
--eval_steps 0.04 \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 1 \
--learning_rate 1e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 0.001 \
--gradient_checkpointing False \
--dataloader_pin_memory True\
--dataloader_num_workers 8 \
--report_to tensorboard
I transfer the safetensor model to bin model here, and you can use this bin model directly.
Thank you very much! I will have a try.
When I finetuned the model, I faced this problem. I have downloaded the segmentation dataset from Huggingface and unzipped them. It seems that I need more processing for the segmentation dataset. What should I do? `rank3: Traceback (most recent call last): rank3: File "/mnt/nvme_share/shiym/projects_3rd/M3D/LaMed/src/train/train.py", line 407, in
rank3: File "/mnt/nvme_share/shiym/projects_3rd/M3D/LaMed/src/train/train.py", line 394, in main
rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train rank3: return inner_training_loop( rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/transformers/trainer.py", line 2230, in _inner_training_loop rank3: for step, inputs in enumerate(epoch_iterator): rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/accelerate/data_loader.py", line 464, in iter rank3: next_batch = next(dataloader_iter) rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next rank3: data = self._next_data() rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1326, in _next_data rank3: return self._process_data(data) rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise rank3: raise exception rank3: ValueError: Caught ValueError in DataLoader worker process 1. rank3: Original Traceback (most recent call last): rank3: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop rank3: data = fetcher.fetch(index) # type: ignorepossibly-undefined: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch rank3: data = self.dataset[idx] for idx in possibly_batched_index: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
rank3: data = self.dataset[idx] for idx in possibly_batched_index: File "/mnt/nvme_share/shiym/projects_3rd/M3D/LaMed/src/dataset/multi_dataset.py", line 1197, in getitem
rank3: return self.datasetidx: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/dataset.py", line 348, in getitem
rank3: return self.datasets[dataset_idx]sample_idx: File "/mnt/nvme_share/shiym/projects_3rd/M3D/LaMed/src/dataset/multi_dataset.py", line 1121, in getitem
rank3: return self.datasetidx: File "/home/shiym/anaconda3/envs/m3d/lib/python3.10/site-packages/torch/utils/data/dataset.py", line 348, in getitem
rank3: return self.datasets[dataset_idx]sample_idx: File "/mnt/nvme_share/shiym/projects_3rd/M3D/LaMed/src/dataset/multi_dataset.py", line 907, in getitem
rank3: cls_id = int(os.path.basename(segpath).split('')[1].split('.')[0])
rank3: ValueError: invalid literal for int() with base 10: '(4, 512, 512, 103)'`