andyk / ht

headless terminal - wrap any binary with a terminal interface for easy programmatic access.
Apache License 2.0
746 stars 12 forks source link

Docs: explain why the project is useful, compared to expect #3

Closed pzmarzly closed 4 weeks ago

pzmarzly commented 4 weeks ago

One of the most frequent question in https://news.ycombinator.com/item?id=40552257 was "but what would i need it for?". There were some good arguments and ideas there, I think it would be nice to put them in the readme

andyk commented 4 weeks ago

Will do!

I haven't used expect before so I'm not sure how ht is different. But I'll look into it.

Regarding the motivating use-case for ht, I just replied in the hackernews thread with the following context:

Hey, project lead here. I had a very specific use case in mind: I’m playing with using LLM agent frameworks for software engineering - like MemGPT, swe-agent, Langchain and my own hobby project called headlong (https://github.com/andyk/headlong). Headlong is focused on making it easy for a human to edit the thought history of an agent via a webapp. The longer term goal of headlong is collecting large-ish human curated datasets that intermix actions/observations/inner-thoughts and then use those data to fine-tune models to see if we can improve their reasoning.

While working on headlong I tried out and implemented a variety of ‘tools’ (i.e., functions) like editFile(), findFile(), sendText(), checkTime(), searchWeb(), etc., which the agents call using LLM function calling.

A bunch of these ended up being functions that interacted with an underlying terminal. This is similar to how swe-agent works actually.

But I figured instead of writing a bunch of functions that sit between the LLM and the terminal, maybe let the LLM use a terminal more like a human does, i.e., by “typing” input into it and looking at snapshots of the current state of it. Needed a way to get those stateful text snapshots though.

I first tried using tmux and also looked to see if any existing libs provide the same functionality. Couldn’t find anything so teamed up with Marcin to design and make ht.

playing with the agent using the terminal directly has evolved into a hypothesis that I’ve been exploring: the terminal may be the “one tool to rule them all” - i.e., if an agent learns to use a terminal well it can do most of what humans do on our computers. Or maybe terminal + browser are the “two tools to rule them all”?

Not sure how useful ht will be for other use cases, but maybe!