Let chat agents set their own goals and take initiative in conversation.
Ceruleus provides a back-end 'internal monologue' for chat agents by implementing flexible loops of goal setting, question formulation, information retrieval and appraisal with natural points for optional user intervention. Information retrieval and summarization is implemented using Squire, while the primary user interaction occurs via oobabooga's text-generation-webui. The API of the latter is also used for high level decision-making by the agent.
Moving beyond the traditional prompt and response interaction model, chat agents running Ceruleus take initiative in conversation, looking up information online and generating relevant messages autonomously on the basis of this information. Ceruleus makes this possible by prompting the language model for optional speech after the completion of an information analysis loop. In addition, information gathered by an agent in the course of its internal process - as well as the history of its successes and failures - is available to it via the 'internal' entry of the text-generation-webui chat log.
Additional data buffers for answers, goals and thoughts are used for internal processing by the agent, representing a 'subconscious' layer of memory that is not directly accessible by the LLM responsible for generating speech. The speech itself is produced in conjunction with goal-setting and question-asking in order to ensure good semantic coupling between the agent's speech and intent. However, portions of the Ceruleus loop also give the chat agent an opportunity to set goals and pose new questions intrinsically, without access to the conversation log.
Given the limitations of openly available LLMs and the amount of additional processing performed by the agent in conjunction with the generation of speech, autonomous speech output is carefully groomed within Ceruleus by the sequential application of regular expressions. While the modular code structure of Ceruleus allows arbitrary LLM-powered steps to be easily introduced in order to further improve output quality, adjusting the behavior of a chat agent running Ceruleus requires little to no knowledge of coding.
The Ceruleus back-end can be monitored live and easily interrogated using a GUI. This graphical interface includes a suite of tools providing insight into the 'thought process' of the agent. It also allows for easy process monitoring and control, as well as the export of data for offline analysis. The Ceruleus GUI also offers easy access to all the prompt templates used by the back-end for LLM-powered steps, enabling real-time prompt engineering.
The Ceruleus back-end uses asynchronous routines where possible. This carries the welcome benefit of fast exchange, making execution times mostly dependent on total LLM inference speed. It also enables live, non-blocking control of the back-end via the GUI.
The project code is kept as modular as possible, enabling the reuse of standard small utility functions. This also permits the easy addition of arbitrary LLM-powered steps via the process()
function to any point in the code. By default, this function will summarize text. In conjunction with a custom text file in 'templates', it can do everything else.
While the Ceruleus GUI is intended to run on the same local machine as the back-end, it uses a websockets interface that in principle allows the client to control the back-end and receive process updates remotely. Remote file management, touch-off and log reading are not yet supported.
Squire is presently an unreliable solution for information retrieval, as the local LLMs powering it have trouble parsing what they retrieve and are prone to confabulation (i.e. hallucination). However, it functions well as a source of extrinsic input to fuel the internal process of an agent running Ceruleus. The information retrieval 'effector' step is particularly well-insulated from the rest of the code, making Squire relatively easy to replace if desired.
The intended setup for running Ceruleus is a machine that runs one LLM on GPU via text-generation-webui for context-heavy inference in addition to 'co-processing' LLMs operating via CPU inference. One is launched by Ceruleus for context-light internal steps, and Squire launches another one.
The asynchronous nature of the back-end can in principle allow for extensive parallelization. This has been difficult to try out in practice due to hardware limitations. However, it is presently possible for a user to converse with the forward-facing webui LLM while the co-processing models are otherwise engaged.
1. Clone this repository and navigate into its directory. i.e.:
git clone https://github.com/dibrale/ceruleus.git
cd ceruleus
2. Install dependencies, i.e. using
pip install -r requirements.txt
3. Clone Squire, i.e. using
git clone https://github.com/dibrale/squire.git
4. Install Squire dependencies, i.e. using
pip install -r squire/requirements.txt
5. Back up your character chat log, i.e.
mkdir backups
cp text-generation-webui/logs/<character_name>_persistent.json backups/<character_name>_persistent.json.bak
Ceruleus edits the chat log directly when it runs.
Note: The GUI starts immediately after the server using the current start.sh
script, so parameter changes made via the GUI will not be reflected until the next run.
6. Open params.json
in the root directory of Ceruleus and edit the parameters to suit your machine. These are detailed below.
7. Start Ceruleus using the provided script, i.e.
./start.sh
This will start the Ceruleus server in the background with output directed to logfile.log
, then start the GUI in the foreground. To stop the back-end, make note of the process ID in the output and kill -9
that process.
Edit the params.json
file before running Ceruleus to reflect your setup, with particular attention to CUDA_VISIBLE_DEVICES
, char_card_path
, char_log_path
and squire_model_path
. The full list of parameters is described below.
Parameter | Type | Default | Description |
---|---|---|---|
script_name | String | ceruleus | The name of the script as it appears on every line of terminal output. |
host | String | localhost | Preferred name of the local host. This may need to be changed to '127.0.0.1' on some machines, or to other addresses in the case of exotic setups |
port | Integer | 1230 | Port on which to run the server API |
verbose | Boolean | false | Set to enable additional terminal output for debugging. |
CUDA_VISIBLE_DEVICES | String | 0 | Comma-separated list of all CUDA devices that should be visible to Ceruleus (eg. use '0,1' if you have two GPUs and want both to be detectable). Passes the shell variable of the same name on execution of external scripts. |
results_dir | String | results | Path to results directory. |
work_dir | String | results | Path to work directory. |
template_dir | String | templates | Path to templates directory. |
char_card_path | String | char.json | Path to the character file Ceruleus is to use, eg. text_generation_webui/characters/<character_name>.json |
char_log_path | String | char_persistent.json | Path to the conversation log file Ceruleus is to use, eg. text_generation_webui/logs/<character_name>_persistent.json |
squire_path | String | squire | Path to the directory where squire.py is located. |
squire_out_dir | String | squire_output | Path to the directory where Squire will write its output. This directory will be monitored for text file activity, and any text file altered within that directory will be processed as an answer string |
model_path | String | ggml-model-q5_1.bin | Path to the model weights to be used when running Squire. Only CPU inference with llama.cpp is supported at this time, so this should be a *.bin file. |
telesend | Boolean | false | Set to write goals in data_visible of the persistent conversation log instead of just in data . |
retry_delay | Integer | 10 | Retry delay for web UI API calls, in seconds |
ping_interval | Integer | 15 | Ping interval for websockets, in seconds |
ping_timeout | Integer | 60 | Ping timeout for websockets, in seconds |
answer_attempts_max | Integer | 2 | Maximum number of times to run Squire on the same question before reappraisal |
internal_str | String | internal | Key of JSON entry for invisible data in conversation log |
visible_str | String | visible | Key of JSON entry for visible data in conversation log |
Included in this parameter file is another object containing parameters used by the appraisal code LLM. These are described in detail elsewhere.
Parameter | Type | Default |
---|---|---|
n_ctx | Integer | 1800 |
top_p | Float | 0.8 |
top_k | Integer | 30 |
repeat_penalty | Float | 1.1 |
temperature | Float | 0.4 |
n_batch | Integer | 700 |
n_threads | Integer | 8 |
n_gpu_layers | Integer | 10 |
Additional Llama parameters that can be passed by LangChain to llama.cpp can be included in this object. The final object in the parameter file contains all the parameters that can be passed to text-generation-webui. See the repository of that project for details regarding these.
Note: A number of convenience features are absent from the open version of Ceruleus at the time of this writing. Contact ADMC Science and Consulting via email if you are interested in the priority implementation of features that suit your needs.
_persistent.json
, if this file is important to you! Ceruleus writes directly to this log. While Ceruleus is not expected to corrupt or delete the chat log, it is good to have a backup in case of unwanted or excessive output.The Ceruleus GUI was written using PySimpleGUI. It opens with start.sh
, but can be opened on its own with python ceruleus_client.py
. This can be useful for parameter editing before starting the software. Once open, the GUI will display a status bar and five tabs: Parameters, Controls, Status, Log and Results.
From the parameters tab, you can view, modify, delete and save parameters for the back-end.
The controls tab allows you to connect to a Ceruleus instance, pause and unpause the Ceruleus loop and touch off the script.
The status tab shows a record of subtask execution with respect to time and allows this data to be exported. So long as data recording is enabled, subtask data will update regardless of whether the tab is active.
The log tab allows for monitoring or analysis of a Ceruleus logfile, with basic filtering if desired.
logfile.log
in the Ceruleus directory), then click 'Load' to load it. A green 'V' will appear once the file is loaded.The results tab allows for the viewing and modification of template, result and work files.
python main.py
from the root directory of the script.python unpause.py
, optionally passing --host
and --port
if Ceruleus is running somewhere other than localhost:1230touch squire_output/out.txt
. Alternatively, overwrite squire_output_out.txt with a file containing an initial answer message of your choice (e.g.: "No answers yet. Ask a question in your reply.")I hope that this tool enhances your LLM-powered chat agents, and you find it both useful and simple to use. Please do not hesitate to contact me via this repository if you have any questions or encounter any issues with the software. ADMC Science and Consulting would be happy to further tailor Ceruleus to your needs and provide priority support. Contact us via email for details!