arcee-ai / DistillKit

An Open Source Toolkit For LLM Distillation
GNU Affero General Public License v3.0
337 stars 36 forks source link

AttributeError: 'DataParallel' object has no attribute 'device' #16

Open Wolfman1219 opened 1 week ago

Wolfman1219 commented 1 week ago

Using the latest cached version of the dataset since mlabonne/FineTome-100k couldn't be found on the Hugging Face Hub Found the latest cached dataset configuration 'default' at /home/deploy/.cache/huggingface/datasets/mlabonne_fine_tome-100k/default/0.0.0/c2343c1372ff31f51aa21248db18bffa3193efdb (last modified on Tue Oct 15 04:50:53 2024). Preprocessing and tokenizing dataset... Dataset preparation complete. Loading models... You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda'). Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 7.88it/s]Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7.67it/s]Spectrum configuration not found. All layers of the student model will be trainable. /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in 'init__': max_seq_length, dataset_text_field. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead. warnings.warn(message, FutureWarning) /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a max_seq_length argument to the SFTTrainer, the value you passed will override the one in the SFTConfig. warnings.warn( /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a dataset_text_field argument to the SFTTrainer, the value you passed will override the one in the SFTConfig. warnings.warn( /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:396: UserWarning: You passed a tokenizer with padding_side not equal to right to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding tokenizer.padding_side = 'right' to your code. warnings.warn( 0%| | 0/16875 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/deploy/second_disk/projects/DistillKit/distil_logits.py", line 189, in trainer.train(resume_from_checkpoint=config["training"]["resume_from_checkpoint"]) File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 434, in train output = super().train(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 2052, in train return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 2388, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 3485, in training_step loss = self.compute_loss(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/deploy/second_disk/projects/DistillKit/distil_logits.py", line 140, in compute_loss print(model.device) ^^^^^^^^^^^^ File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1729, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'DataParallel' object has no attribute 'device' 0%| | 0/16875 [00:00<?, ?it/s]