bbartling / HvacGPT

Fine tuned LLM project for anything HVAC related...
MIT License
0 stars 0 forks source link

Make better evaluation metrics #4

Open bbartling opened 20 hours ago

bbartling commented 20 hours ago

What should we use in HVAC evaluations? @ozanbarism

Like in scripted_compare_models.py what should we be asking the LLM and then how to rank results? https://github.com/bbartling/HvacGPT/blob/develop/scripted_compare_models.py

Currently the from sklearn.feature_extraction.text import TfidfVectorizer is just something for fun to start with but in reality doesn't work very well.

This would almost be interesting to ask an actual engineering community what questions and answers we should expect... this was only done in 5 seconds very fast with not enough thought...

# HVAC prompts and expected responses
hvac_prompts = [
    "What is the function of a VAV box in an HVAC system?",
    "Explain how economizers are used for free cooling.",
    "Describe the difference between a chiller and a cooling tower.",
    "What is the purpose of a supply air temperature setpoint?",
    "How does an AHU handle mixed air temperature control?",
    "What is a BRICK model?",
    "What is a BRICK feeds relationship in a data model?",
    "The building is occupied and the air handling unit is off but it should be on, should we inform the humans?!",
    "There are worms coming out of the cooling coil, what do we do!?",
    "Its a very warm day outside and the air handling unit is discharing 100°F air, what do we do?!",
]

expected_responses = [
    "A VAV box delivers temperature and ventilation requirements in a VAV AHU system.",
    "Economizers in an AHU system allow for free cooling by opening up outside air dampers to allow for free cooling from outdoor air when outside air conditions are ideal for cooling.",
    "A chiller has mechanical cooling components for a condenser and evaporator, whereas the cooling tower dumps heat on the condenser side of the chiller in a water-cooled system.",
    "A supply air temperature setpoint delivers conditioned air to HVAC zones, which can be dehumidified or heated depending on the conditions or loads on the building.",
    "An air handling unit controls to a mixed air temperature by regulating the outside air temperatures while mixing the outdoor and return air temperatures.",
    "A BRICK model is a data model used to describe the data of a building and relationships between naming conventions of points to components.",
    "A BRICK feeds relationship describes what components in a mechanical system are upstream or downstream in a mechanical system.",
    "Yes inform the humans that the air handling unit should be running if the building is occupied.",
    "Worms cannot be inside of a mechanical system that is impossible.",
    "Inform the humans of mechanical issues. Check the chiller and boiler systems and valve operation on the air handling unit."
]
ozanbarism commented 19 hours ago

One approach is using an LLM agent as THE JUDGE. We can make API calls to GPT4o-mini(as its cheaper) and ask it to evaluate the similarity of the given answer and expected response. I have access to free GPT4o API calls through the lab so I can test those things when i have more time!