Integrate Qwen2.5 7B Model for Question Generation
Changes
Replaced the T5 model with Qwen2.5 7B for question generation
Removed answer generation, leaving the "Answer" field empty
Updated the prompt template to include a Chain-of-Thought example
Adjusted the tokenization and generation process to work with the Qwen model
Removed 8-bit quantization option as it's not needed for this implementation
Rationale
The Qwen2.5 7B model provides more advanced question generation capabilities compared to the previous T5 model. By focusing solely on question generation without answers, we streamline the process for scenarios where RAG is not being performed end-to-end.
How to Run
Ensure you have the required dependencies installed:
pip install transformers datasets torch
Place your knowledge_dataset.csv file in the same directory as the script. There's a mock one, so don't worry.
Adjust the batch_size and sample_size as needed. The output_dir specifies where the generated questions will be saved.
The script will process the dataset, generate questions, and save the results in the specified output directory.
Notes
The script now uses the Qwen2.5 7B model, which requires more computational resources. Ensure your system has sufficient GPU memory or adjust the batch size accordingly.
The "Answer" field in the output will be empty, as we're only generating questions.
The script includes a filter to remove malformed questions (those not ending with a question mark).
Integrate Qwen2.5 7B Model for Question Generation
Changes
Rationale
The Qwen2.5 7B model provides more advanced question generation capabilities compared to the previous T5 model. By focusing solely on question generation without answers, we streamline the process for scenarios where RAG is not being performed end-to-end.
How to Run
Ensure you have the required dependencies installed:
Place your
knowledge_dataset.csv
file in the same directory as the script. There's a mock one, so don't worry.Run the script with the following command:
Adjust the
batch_size
andsample_size
as needed. Theoutput_dir
specifies where the generated questions will be saved.The script will process the dataset, generate questions, and save the results in the specified output directory.
Notes