PauseAI / pauseai-website

Website for PauseAI.info
https://pauseai.info
Other
10 stars 28 forks source link

Add to SOTA: worse than humans at agency #57

Open Pato-desu opened 5 months ago

Pato-desu commented 5 months ago

For being a little more honest, and we can add there as that even if LLMs right now aren't agentic the big labs are trying to change that

joepio commented 5 months ago

Yeah the SOTA needs some more things AIs aren't good at. However, I don't think agency is that clear to me. The reason that AutoGPT doesn't really lead to useful behaviour has to do with a bunch of underlying shortcomings:

joepio commented 5 months ago

Although to be honest, short-term-memory also isn't the right abstraction. An LLM can recite many pages letter by letter if it's in the context. However, as soon as it does not fit in the context, it doesn't remember something. Maybe this is part of the "can't update own weights" problem then.

Pato-desu commented 5 months ago

I don't know if I agree with you, but I guess agency can mean two different things. One is acting like an agent and the other deciding stuff somewhat independently. Right now they are "bad" for both.

joepio commented 5 months ago

What does it mean to "act like an agent", though? I just don't have a clear understanding of this concept I suppose.

We have NN's playing all sorts of video games. I'd consider these agents. They perform actions in a system to achieve some objective.

I agree that if you give AutoGPT a goal, it performs sub-human in pretty much any task. But I think the reason it's bad can be explained mostly by other shortcomings: it can't make a cup of coffee because it can't control a body. It can't reliably perform tasks as it hallucinates things.

But maybe I'm missing some important other dimension here?

Pato-desu commented 5 months ago

Yeah, you're right. I'm not sure what I wanted to refer to. I'm confused now.

Pato-desu commented 5 months ago

Being general purpose?

joepio commented 5 months ago

Still not clearly defined to me.