Seamless Switch Between On-the-Fly and Pre-Computed Logits

Description:

This PR introduces a significant enhancement to the DAMTrainer class, allowing for a seamless switch between on-the-fly and pre-computed logits. This flexibility is particularly beneficial for users with GPUs, enabling faster operations and more efficient testing experiments.

Key Changes:

New Parameters:
- generate_logits_on_fly: A boolean parameter to control whether logits should be generated on-the-fly or pre-computed.
- use_all_logits: A boolean parameter to indicate if all logits should be used. This is only applicable when generate_logits_on_fly is True.
Assertions:
- Added an assertion to ensure that use_all_logits cannot be True if generate_logits_on_fly is False.
Logits Computation:
- When generate_logits_on_fly is True, logits for each individual model are computed dynamically during training.
- When generate_logits_on_fly is False, pre-computed logits are used, and the top-K logits are gathered using the provided indices.
Efficiency Improvements:
- By allowing on-the-fly logits generation, users with GPUs can leverage their hardware to perform operations faster.
- This flexibility also aids in conducting various testing experiments more efficiently.
Code Updates:
- Updated the compute_loss method to handle both on-the-fly and pre-computed logits.
- Modified the compute_individual_logit_losses method to accommodate the new parameters and logic.

Benefits:

Performance: Users with GPUs can experience faster training times by generating logits on-the-fly.
Flexibility: The ability to switch between on-the-fly and pre-computed logits provides greater flexibility for different use cases and testing scenarios.
Simplified Workflow: When using on-the-fly logits generation, there is no need to manage and store top-K logits, simplifying the workflow.

Usage: To use the new functionality, simply set the generate_logits_on_fly and use_all_logits parameters when initializing the DAMTrainer:

trainer = DAMTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    lambda_coef=lambda_coef,
    lambda_coef_l1=lambda_coef_l1,
    lambda_coef_l2=lambda_coef_l2,
    temperature=temperature,
    use_kl=use_kl,
    use_mse=use_mse,
    use_entropy=use_entropy,
    base_model_path=base_model_name,
    generate_logits_on_fly=True,  # Enable on-the-fly logits generation
    use_all_logits=True,  # Use all logits when generating on-the-fly
)

This PR enhances the DAMTrainer class, making it more versatile and efficient for various training and testing scenarios.

arcee-ai / DAM

Seamless Switch Between On-the-Fly and Pre-Computed Logits #19

Seamless Switch Between On-the-Fly and Pre-Computed Logits