Open IcanDoItL opened 1 month ago
args = SentenceTransformerTrainingArguments(
# Required parameter:
output_dir=output_dir,
# Optional training parameters:
num_train_epochs=num_train_epochs,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
warmup_ratio=0.1,
fp16=True, # Set to False if you get an error that your GPU can't run on FP16
bf16=False, # Set to True if you have a GPU that supports BF16
batch_sampler=BatchSamplers.NO_DUPLICATES, # MultipleNegativesRankingLoss benefits from no duplicate samples in a batch
multi_dataset_batch_sampler=MultiDatasetBatchSamplers.PROPORTIONAL, # PROPORTIONAL or ROUND_ROBIN
# Optional tracking/debugging parameters:
eval_strategy="steps",
eval_steps=250,
save_strategy="steps",
save_steps=250,
save_total_limit=2,
logging_steps=100,
run_name="mnrl-cl-multi", # Will be used in W&B if `wandb` is installed
)
platform: centos 7 pytorch==2.2.1 pytorch-cuda==12.1
The RTX 3080 Ti is set with half_precision_backend: 'auto' and fp16=True, but it’s not effective and memory usage is not reduced. What could be the possible reasons?