shamanez commented 2 weeks ago

Integrate Qwen2.5 7B Model for Question Generation

Changes

Replaced the T5 model with Qwen2.5 7B for question generation
Removed answer generation, leaving the "Answer" field empty
Updated the prompt template to include a Chain-of-Thought example
Adjusted the tokenization and generation process to work with the Qwen model
Removed 8-bit quantization option as it's not needed for this implementation

Rationale

The Qwen2.5 7B model provides more advanced question generation capabilities compared to the previous T5 model. By focusing solely on question generation without answers, we streamline the process for scenarios where RAG is not being performed end-to-end.

How to Run

Ensure you have the required dependencies installed:
```
pip install transformers datasets torch
```
Place your knowledge_dataset.csv file in the same directory as the script. There's a mock one, so don't worry.
Run the script with the following command:
```
python question_answer_generation.py \
   --dataset_path=knowledge_dataset.csv \
   --batch_size=8 \
   --sample_size=50 \
   --output_dir=out
```
Adjust the batch_size and sample_size as needed. The output_dir specifies where the generated questions will be saved.
The script will process the dataset, generate questions, and save the results in the specified output directory.

Notes

The script now uses the Qwen2.5 7B model, which requires more computational resources. Ensure your system has sufficient GPU memory or adjust the batch size accordingly.
The "Answer" field in the output will be empty, as we're only generating questions.
The script includes a filter to remove malformed questions (those not ending with a question mark).

Sriharsha-hatwar commented 2 weeks ago

But, there is a check that seems to be failing?

Jacobsolawetz commented 2 weeks ago

@Crystalcareai for reference this is qa gen for retrieval training on your 2.5 7B rec

arcee-ai / DALM

added Qwen2.5 to generate QA pairs. #96

Integrate Qwen2.5 7B Model for Question Generation

Changes

Rationale

How to Run

Notes