akiwarheit / mads-capstone-project

0 stars 0 forks source link

Data Preprocessing #12

Open akiwarheit opened 2 weeks ago

akiwarheit commented 2 weeks ago

T5 is a sequence-to-sequence model, so you need to format your input and output pairs accordingly. For information retrieval, you might want to frame it as a question-answering task.

Input Format: Combine persona data and property data into a structured query format. Example Input: "Persona: family size 4, income $80K, prefers suburban area. Property: 3 bedroom house, 2 bathrooms, located in suburbs."

Output Format: The expected retrieval result, such as a list of recommended properties or specific property details. Example Output: "Recommended Property: 123 Maple St, 3 bedrooms, 2 bathrooms, $300K"

def create_input_output(row):
    input_text = f"Persona: {row['persona_data']}. Property: {row['property_data']}."
    output_text = f"Recommended Property: {row['recommended_property']}."
    return input_text, output_text