Using the latest cached version of the dataset since mlabonne/FineTome-100k couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /home/deploy/.cache/huggingface/datasets/mlabonne_fine_tome-100k/default/0.0.0/c2343c1372ff31f51aa21248db18bffa3193efdb (last modified on Tue Oct 15 04:50:53 2024).
Preprocessing and tokenizing dataset...
Dataset preparation complete. Loading models...
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 7.88it/s]Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7.67it/s]Spectrum configuration not found. All layers of the student model will be trainable.
/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in 'init__': max_seq_length, dataset_text_field. Will not be supported from version '1.0.0'.
Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
warnings.warn(message, FutureWarning)
/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a max_seq_length argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.
warnings.warn(
/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a dataset_text_field argument to the SFTTrainer, the value you passed will override the one in the SFTConfig.
warnings.warn(
/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:396: UserWarning: You passed a tokenizer with padding_side not equal to right to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding tokenizer.padding_side = 'right' to your code.
warnings.warn(
0%| | 0/16875 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/deploy/second_disk/projects/DistillKit/distil_logits.py", line 189, in
trainer.train(resume_from_checkpoint=config["training"]["resume_from_checkpoint"])
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 434, in train
output = super().train(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 2052, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 2388, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 3485, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/distil_logits.py", line 140, in compute_loss
print(model.device)
^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1729, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'DataParallel' object has no attribute 'device'
0%| | 0/16875 [00:00<?, ?it/s]
Using the latest cached version of the dataset since mlabonne/FineTome-100k couldn't be found on the Hugging Face Hub Found the latest cached dataset configuration 'default' at /home/deploy/.cache/huggingface/datasets/mlabonne_fine_tome-100k/default/0.0.0/c2343c1372ff31f51aa21248db18bffa3193efdb (last modified on Tue Oct 15 04:50:53 2024). Preprocessing and tokenizing dataset... Dataset preparation complete. Loading models... You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with
model.to('cuda')
. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 7.88it/s]Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7.67it/s]Spectrum configuration not found. All layers of the student model will be trainable. /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in 'init__': max_seq_length, dataset_text_field. Will not be supported from version '1.0.0'.Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead. warnings.warn(message, FutureWarning) /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a
trainer.train(resume_from_checkpoint=config["training"]["resume_from_checkpoint"])
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 434, in train
output = super().train(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 2052, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 2388, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/transformers/trainer.py", line 3485, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/distil_logits.py", line 140, in compute_loss
print(model.device)
^^^^^^^^^^^^
File "/home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1729, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'DataParallel' object has no attribute 'device'
0%| | 0/16875 [00:00<?, ?it/s]
max_seq_length
argument to the SFTTrainer, the value you passed will override the one in theSFTConfig
. warnings.warn( /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed adataset_text_field
argument to the SFTTrainer, the value you passed will override the one in theSFTConfig
. warnings.warn( /home/deploy/second_disk/projects/DistillKit/dist_virt/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:396: UserWarning: You passed a tokenizer withpadding_side
not equal toright
to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider addingtokenizer.padding_side = 'right'
to your code. warnings.warn( 0%| | 0/16875 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/deploy/second_disk/projects/DistillKit/distil_logits.py", line 189, in