allenporter / home-assistant-datasets

This package is a collection of datasets for evaluating AI Models in the context of Home Assistant.
https://allenporter.github.io/home-assistant-datasets
22 stars 1 forks source link

Multilanguage support #28

Open gacekk opened 1 month ago

gacekk commented 1 month ago

Hi

I understand that this is all based on data in English. How about other languages?

Most LLM are predominantly trained using English data and have limited support for other languages.

allenporter commented 3 weeks ago

Hi,

Can you give more context on the outcomes

Some of the homes generated in the dataset are using other languages e.g. https://github.com/allenporter/home-assistant-datasets/blob/main/datasets/devices/casa-del-sol-es.yaml

There are two things we can do: (1) Using the crowdsourced home assistant intents data e.g. https://github.com/home-assistant/intents/blob/main/tests/pl/climate_HassClimateGetTemperature.yaml and converting to the format here https://github.com/allenporter/home-assistant-datasets/tree/main/datasets/intents (2) Modify the generation notebooks to ask for sentences in specific languages: https://github.com/allenporter/home-assistant-datasets/blob/main/generation/device-actions.ipynb

What is the specific use case your working on? Which dataset are you loooking for?