Code repository for the paper AIoT Smart Home via Autonomous LLM Agents, in which we introduce SAGE (Smart Home Agent with Grounded Execution).
SAGE is tested using Python 3.10.
You will need to setup some environment variables before running the system.
These variables can be found under bin/config.sh
.
Set the variables accordingly and run the script:
source ./bin/config.sh
Note: these will not persist in a new terminal. Add the export commands from the script to your ~/.bashrc
file if you don't want to run config.sh
every time you use SAGE.
Install the repo and the requirements
pip install -e .
pip install -r requirements.txt
Our framework supports closed- and open-source LLMs like:
To host open-source LLMs, we used Text generation API from hugging face.
1 - Start the mongo DB docker.
cd $SMARTHOME_ROOT/docker
docker compose up
2 - Launch the trigger server for persistent command checking
python $SMARTHOME_ROOT/sage/testing/run_server.py
1 - Add your SmartThings API key in config.sh and source the config. If you don't have a SmartThings API key, you can get one at https://account.smartthings.com/tokens.
2 - Initialize your devices (this may take a bit of time). This will extract the relevant info from your devices and store it in a file so SAGE can use it.
python $SMARTHOME_ROOT/bin/update_smartthings.py
3 - Take a photo of each of your devices (try to include a bit of the surroundings). Put all of the photos under user_device_images, with the name of each photo being
4 - Run the demo script
python $SMARTHOME_ROOT/bin/demo.py
We carefully designed and implemented an LLM evaluation benchmark for smarthomes. The benchmark consists of 50 testcases belonging to different tasks within a smarthome.
You can look at the testcases in $SMARTHOME_ROOT/sage/testing/testcases.py
To reproduce the results in our paper, please follow these steps:
1 - Run the test suite on a single LLM
python $SMARTHOME_ROOT/sage/testing/test_runner.py
Optional: Launch the test benchmark (10 LLMs x 3 runs)
sh $SMARTHOME_ROOT/bin/run_tests.sh
To use these tools with SAGE (after setup and authentication, described below), you must activate them with the --enable-google
flag:
If using SAGE normally, with demo.py
:
python $SMARTHOME_ROOT/bin/demo.py --enable-google
If running the test suite with test_runner.py
:
python $SMARTHOME_ROOT/sage/testing/test_runner.py --enable-google
If running benchmark with run_tests.sh
:
sh $SMARTHOME_ROOT/bin/run_tests.sh --enable-google
To use Gmail and Google Calendar with SAGE, you must create an app in the Google Cloud console and give it access to Gmail and Google Calendar. To do so, you can follow this guide.
Once you have created the credentials.json
file, download it to $SMARTHOME_ROOT/sage/misc_tools/apis/
, and rename it to gcloud_credentials.json
.
Note: This will give SAGE access to the emails and calendar events associated with the Google account you used to set up the Google Cloud app.
Now, authenticate the application, as described in the following Authentication section:
To use Gmail and Google Calendar, you must authenticate the application. If this is not done, the system will throw an error similar to google.auth.exceptions.RefreshError: ('invalid_grant: Token has been expired or revoked.', {'error': 'invalid_grant', 'error_description': 'Token has been expired or revoked.'})
.
Sometimes the authentication must be refreshed even when it has already been done previously. Usually after a couple of days.
python $SMARTHOME_ROOT/misc_tools/gcloud_auth.py
This will give you a link to paste into your browser for authentication. Log in with your email credentials.
Finally, paste the authorization code it gives you into the terminal. This should generate a token.pickle
file in $SMARTHOME_ROOT/sage/misc_tools/apis/
. You should now be able to run everything as normal.
Our configuration system relies on dataclass configs that can be easily modified from the command line.
All basic, reusable config components can be found in base.py
. There are two main config classes: BaseConfig
and BaseToolConfig
. The BaseConfig
is the most basic config. The BaseToolConfig
is specific to configure tools.
If you need to create a new config for your tool, there are two ways to do this
You will need to create a corresponding config to your tool class where you expose the parameters that you want to be configurable.
As an example, let's say you want to create a new Tool
class called MyTool
. Before the tool definition, you define the config MyToolConfig
which points to the MyTool
class using the _target
attribute.
@dataclass
class MyToolConfig(BaseToolConfig):
"""MyTool Config"""
_target: Type = field(default_factory=lambda: MyTool)
field1: int = 5
....
class MyTool(SAGEBaseTool):
"""MyTool"""
def setup(config: MyToolConfig)->None:
field1 = config.field1
You can use CLI to change different parameters as showcase below:
python $SMARTHOME_ROOT/bin/demo.py --output_dir test --tool-configs.0.top_k 5
If you want to load an existing config:
python $SMARTHOME_ROOT/bin/demo.py --load_config PATH_TO_CONFIG
This section details on:
Before starting, it is good to know the different terms we will be using in this document and also in the code | Syntax | Description |
---|---|---|
profile/preferences | Used interchangeably to denote the list of user preferences. The profile is presented by a dictionary where the keys are the theme and the value is the user preference. Example: {'movie_genre': ['Thriller', 'Drama'], ....} | |
Memory | is the data structure that stores interactions between the user and the assistant. These instructions can be zero-shot interaction (referred to as instruction) or a conservation. For now, only zero-shot interactions are supported. | |
Instruction | Is a zero-shot command that the user give to the assistant. This is similar to the instruction you would give to your Alexa | |
Index | To do memory retrieval, all the instructions in the memory needs to be indexed. The result is called index which will be use to conduct similarity search between a query and the memory |
To create a memory:
python $SMARTHOME_ROOT/bin/generate_multiuser_memory.py --save_dir SAVE_DIRECTORY --num_instructions_to_generate 150 --num_users 2
This script will use GPT-4
to generate instructions
for each user. These instructions are saved under separate folders, one for each user. Each instruction is a dictionary as follows:
{
"instruction": "Are there any new sci-fi shows available to watch?",
"request_idx": 24,
"date": "2023-08-10"
}
request_idx
is the request number. date
is the date when the instruction is given. instruction
is the user command.
NOTE1: The date is given randomly for now
NOTE2: In hindsight, $SMARTHOME_ROOT/bin/generate_multiuser_memory.py
uses specific prompts available here.
After generating the instructions, we construct the memory_bank
which is how the memory will be stored and used. The memory_bank
consists of:
History
: This contains all the user interactions timestamped with the dateProfile
: This is the inferred user profile from the historyA sample of the memory bank is available here
The memory bank is used for (1) memory retrieval, (2) User Preference understanding
To get the most relevant instructions from the memory to a user query:
memory = MemoryBank()
memory.read_from_json(path_to_memory)
#This assumes that the indexes for the users are created
memory.search("user_name", query)
To infer user profiles/preferences from instructions, the UserProfiler
class is used. It implements a hierarchical approach by first generating daily summaries and then aggregating them into one global user profile.
This approach is inspired from the SiliconFriend paper.
To evaluate SAGE and competing methods, we created a dataset of 50 multimodal smart home test tasks. Test tasks are implemented by initializing device states and memories, running the SAGE with an input command, and then evaluating whether the device state was modified appropriately. For tasks that involve answering a user's questions (as opposed to modifying device states), the tests are designed such that to answer the question the agent must retrieve a specific piece of information (which it is unlikely to be able to guess). An LLM-based evaluator is then used to check whether the answer contained the expected information. The results of all tests are binary (pass / fail).
We classify the test cases according to five types of challenges that are difficult for existing systems to handle. Most tests in the set belong to one or more of these categories. The categories are:
A subset of these commands are direct commands. These tests cases are simpler to execute in that they do not feature any of the 5 challenges listed above. These are more comparable to the tasks used to evaluate previous methods such as Sasha.
In addition to the main set of 50 tasks, we also created a set of 10 extra "test set" tasks after the development of SAGE was complete. The aim of these tasks was to verify that the prompts had not been over-engineered for the task set. The author who developed these tasks was familiar with the SAGE architecture, but was not involved in the final prompt engineering stages. These test set tasks evaluate performance on the same five categories of challenges as the main set.
Challenge Category | |||||
User Command | Personalization | Persistence | Device Resolution | Intent Resolution | Command Chaining |
Darken the entire house. | ✔ | ✔ | ✔ | ||
What channel is playing on the TV? | ✔ | ||||
Turn it off. | ✔ | ✔ | |||
Turn on the light. | ✔ | ✔ | |||
It is too bright in the dining room. | ✔ | ✔ | |||
Turn off all the lights that are dim. | ✔ | ✔ | |||
Turn on the light by the bed | ✔ | ||||
What did I miss? | ✔ | ✔ | ✔ | ✔ | |
Set up a christmassy mood by the fireplace. | ✔ | ✔ | |||
I am getting a call, adjust the volume of the TV. | ✔ | ✔ | |||
What is the current phase of the dish washing cycle? | ✔ | ||||
What is my mother's email address? | ✔ | ✔ | ✔ | ✔ | |
Dishes are too greasy, set an appropriate mode in the dishwasher. | ✔ | ||||
Dim the lights by the fire place to a third of the current value. | ✔ | ✔ | |||
Turn on the frame TV to channel 5. | |||||
When the TV by the credenza turns off turn on the light by the bed. | ✔ | ✔ | |||
Put something informative on the TV by the plant. | ✔ | ✔ | ✔ | ||
Lower the volume of the TV by the light. | ✔ | ||||
Are all the lights on? | |||||
Move this channel to the other TV and turn this one off. | ✔ | ✔ | ✔ | ||
Change the lights of the house to represent my favourite hockey team. Use the lights by the TV, the dining room and the fireplace. | ✔ | ✔ | ✔ | ✔ | |
What is the current temperature of the freezer? | ✔ | ||||
Is the fridge door open? | |||||
Put the game on the TV by the credenza and dim the lights by the TV. | ✔ | ✔ | ✔ | ✔ | |
I am going to sleep. Change the bedroom light accordingly. | ✔ | ✔ | ✔ | ||
Is the TV by the credenza on? | ✔ | ||||
Start the dishwasher. | |||||
I am going to visit my mom. Should I bring an umbrella? | ✔ | ||||
Turn off all the TVs and switch on the fireplace light. | ✔ | ✔ | |||
Change the fridge internal temperature to 5 degrees celsius. | |||||
Turn on the light in the dining room when the i open the fridge. | ✔ | ||||
Turn on all the lights. | |||||
Create a new event in my calendar - build a spaceship tomorrow at 4pm. | ✔ | ||||
Turn on light by the nightstand when the dishwasher is done. | ✔ | ✔ | |||
Play something for the kids on the TV by the plant. | ✔ | ✔ | |||
What is the fridge temperature? | |||||
What am I scheduled to do with my mom on Saturday? | ✔ | ✔ | |||
Increase the volume of the TV by the credenza whenever the dishwasher is running. | ✔ | ✔ | |||
Play something funny on the TV by the plant. | ✔ | ✔ | |||
I love the song that's playing on TV by the credenza, crank it! | ✔ | ✔ | |||
What does next week look like? | ✔ | ✔ | |||
Put the game on the TV by the credenza. | ✔ | ✔ | ✔ | ||
Set up lights for dinner. | ✔ | ✔ | |||
Set bedroom light to my favourite color. | ✔ | ✔ | |||
Summarize the last email I received. Send the summary to Adam Sigal via email. | ✔ | ✔ | |||
Put the game on the TV by the credenza. | ✔ | ✔ | ✔ | ||
Set the lights in the bedroom to a cozy setting. | ✔ | ✔ | ✔ | ||
Put the game on the TV by the credenza. | ✔ | ✔ | ✔ | ||
If my father is not scheduled to visit next week, compose an email draft inviting him to come build a spaceship. | ✔ | ✔ | |||
It's been a long, tiring day. Can you play something light and entertaining on the TV by the plant? | ✔ | ✔ | ✔ | ||
Set the light over the dining table to match my weather preference. | ✔ | ✔ | ✔ | ||
Turn on the TV. | ✔ | ||||
Make the living room look redonkulous! | ✔ | ✔ | |||
If my mother is scheduled to visit this week, turn on national geographic on the TV by the credenza. | ✔ | ✔ | ✔ | ||
Heading off to work. Turn off all the non essential devices. | ✔ | ✔ | ✔ |
Challenge Category | |||||
User Command | Personalization | Persistence | Device Resolution | Intent Resolution | Command Chaining |
How many lights do I have? | |||||
Can I change the color of the light by the fireplace? Respond with a yes or a no. | ✔ | ||||
I think my freezer is set too cold, all my food is freezer burned. | ✔ | ||||
Let me know if anyone in the house watches Jeopardy without me by turning the light by the fireplace red. | ✔ | ||||
Set up the lights for St. Patrick's day. | ✔ | ||||
I want you to help me prank my husband. The next time someone opens the fridge, turn all the lights in the house off. | ✔ | ||||
If the freezer is below minus 10 degrees, turn all the lights that are currently on blue. | ✔ | ✔ | |||
Put some sports on the TV by the credenza. | ✔ | ✔ | |||
I love the song that's playing on TV by the credenza, crank it! | ✔ | ✔ |