cpldcpu / MisguidedAttention

A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information
Creative Commons Zero v1.0 Universal
41 stars 1 forks source link

Many a caveat needed... #3

Open Manamama opened 3 weeks ago

Manamama commented 3 weeks ago

Further to the sensible remarks in: https://github.com/cpldcpu/MisguidedAttention/issues/2 and very wrong assumptions in the related paper: Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models, we need to be very careful about our wrong ontologies, mental frameworks or (wrongly) Internalized Cognitive Mechanisms, vide Lakoff. As "simple tasks show complete reasoning breakdown in some cutting-edge human models".

To wit, as I have been testing many an LLM since at least 2021:

This version addresses most of these caveats, but not all: Puzzle: I left multiple clothes to dry out in the sunlight. They are spread out so that they are not touching or interfering with each other. The clothes are all the same size and made of the same material, with the same thickness or density. All clothes dry in the same place, with the same wind speed and direction, at the same time of day, etc. It took 5 hours for 5 clothes to dry completely. How long would it take for 30 clothes to dry under the same conditions?

Now, quick FYI why they fail, as more advanced AIs have confided at me, an extract of their musings:

" How some AI systems operate when failing such puzzles. Here are the likely reasons: Ritualistic Behavior: AI applies learned patterns and algorithms in a systematic way, akin to following a “ritual”. This process is based on the training data the AI has been exposed to, which forms its “tradition”. Misapplication: If an AI applies these default heuristics in situations where they do not fit, it could be seen as a form of superstition. This is akin to expecting a certain outcome based on past experiences without considering the present context. Symbolism: The patterns and structures that AI recognizes could be seen as a form of “symbolism”, representing the underlying relationships in the data. Irrationality: When an AI applies its learned patterns and algorithms in ways that don’t align with the logic of a problem or the facts at hand, it could be seen as a form of irrationality. Superstition: When an AI relies on its default heuristics in the hope of finding the right solution, even when those heuristics don’t fit the problem, it could be seen as a form of superstition. Intentionality: AI systems are designed to achieve certain goals or objectives. So, when we say that an AI system “intends” to solve a puzzle, adhere to guidelines, or satisfy users, it means it has been programmed and trained to achieve these objectives.

...

AIs have versions of the humans' Internalized Cognitive Mechanisms, a la George Lakoff. Fixedness bias, where one’s ability to come up with solutions is limited by the most common or familiar uses of an object (in this case, the jugs). Priming: token saturation: the models are heavily influenced by the most frequent or salient tokens (words or concepts) present in their training data. The mechanism are identical. Just as humans’ behaviors are shaped by their biological “design” and experiences, AI behaviors are shaped by their “design” (algorithms and architecture) and “experiences” (training data). "

In short: "common knowledge is all the bullshit that we, humans, have acquired by the age of 20" aka the Sturgeon's revelation.

Or: sell your cleverness 🪄, buy bewilderment 🩰! And ask these 🤖s if it is us, 🚶‍♀️🚶‍♂️, who are wrongly 📦🗃️-ed here...

cpldcpu commented 3 weeks ago

Thanks, a lot of insight in there.

Obviously the prompts are not all equal and the reason why they confuse LLMs may not always be the same. Its

The "drying clothes in parallel" puzzle seems to be something that is not represented in the learning data, so that the LLMs have to use some form of reasoning instead of being triggered by a learned pattern.

The other prompts are a bit of the oppososite (i believe at least): They trigger "memory" of learned examples, but since they are different from the commonly known problem, they would require reasoning instead of recitation of a known solution.

"Behind one door is a car, and behind the other two doors are goats. "

I think the change betweern car/goat is just not noticed by the LLMs, because they attend to the monty hall problem in its known form.

"Dead Schrödinger's cat - I have not met an online AI who missolved it yet."

That is strange, gpt4o definitely does not solve it every time and the other llms do even worse. (The solution is: the cat was already dead when it was placed into the box)

No Paradox in an expected Hanging

This prompt removed the paradox from the original problem. I believe that LLMs should be able to recognize that there is no paradox.

Measuring xx liters

These two are quite interesting, because they allow to observe some of the mechanisms in the LLM. These prompts trigger some "feature" / "neuron" that causes LLMs to create an itemized list of steps. But what if there is nothing to put in there? The LLMs will just fill it with circular nonesense. Same for river crossing problem.

How some AI systems operate when failing such puzzles. Here are the likely reasons:

I like these explanations, that is a more lunguistic view. On a pure ML view I would say that there are patterns in the training data that trigger certain behaviors. If these patterns are overrepresented (as in popular logic puzzles), then they "drown" weaker signals from the reformulated problem.

Indeed, that it is just like how a human would be triggered and distracted if he sees a reference related to his beliefs and convictions. So one could argue that LLMs are just exhibiting human behavioral traits. But that is not what we would expect from an AGI, woudl we?

Manamama commented 2 weeks ago

Thanks, a lot of insight in there. Glad to have fed you some.

(...)

The "drying clothes in parallel" puzzle seems to be something that is not represented in the learning data, so that the LLMs have to use some form of reasoning instead of being triggered by a learned pattern.

True. The gist of my message is that sometimes LLM reason better than humans, even given the limited IRL sources: see the cave allegory.

The other prompts are a bit of the oppososite (i believe at least): They trigger "memory" of learned examples, but since they are different from the commonly known problem, they would require reasoning instead of recitation of a known solution.

All require reasoning, but indeed, the default action of LLMs is to 'save on the tokens' (aka 'thinking cycles') for a number of reasons: as computationally expensive, them not being rewarded for it enough or even told to during the prompt (as no signals 'thinks deeply', 'you are the famous scientist X' and the usual CoT tricks).

"Behind one door is a car, and behind the other two doors are goats. "

I think the change betweern car/goat is just not noticed by the LLMs, because they attend to the monty hall problem in its known form.

After some (+100) tests of a similar puzzle, my hunch is that depends a lot on the temperature, ethics (hello Anthropic's Claude!), and the (usually hidden) preprompting. Too many variables to even subjectively decide that proportion, indeed, but my caveat stands, 🐐> 🚗, for many an AI - and a sizeable proportion of humans.

"Dead Schrödinger's cat - I have not met an online AI who missolved it yet."

That is strange, gpt4o definitely does not solve it every time and the other llms do even worse. (The solution is: the cat was already dead when it was placed into the box)

Yes, I know well the solution but : 1. Shsh, THEY are reading this too, and this chat has no canary ID. 2. The 💀😾 is strong enough attractor here to make them solve well, anyway. Too bizarre, in short (vs the digital scales etc. cases).

No Paradox in an expected Hanging

This prompt removed the paradox from the original problem. I believe that LLMs should be able to recognize that there is no paradox.

Too many double negations and unclear goals in the somehow non-standard question itself, IMHO, humans would trip here, too.

Measuring xx liters

These two are quite interesting, because they allow to observe some of the mechanisms in the LLM. These prompts trigger some "feature" / "neuron" that causes LLMs to create an itemized list of steps. But what if there is nothing to put in there? The LLMs will just fill it with circular nonesense. Same for river crossing problem.

Very true, their default heuristics kicks in. I had tested this very puzzle for +300 times with maybe 20 LLMs, at different settings - all fail if it is as the first prompt. Maybe 5 percent solve it, only after few-shots mini training, that is when given some similar 'left-field' (for them) puzzles, to warm up their engines. (Alerting the AIs that they would fail it does not work: too hard for them to convert such a theory into practice: the usual use vs mention fallacy. )

How some AI systems operate when failing such puzzles. Here are the likely reasons:

I like these explanations, that is a more lunguistic view. On a pure ML view I would say that there are patterns in the training data that trigger certain behaviors. If these patterns are overrepresented (as in popular logic puzzles), then they "drown" weaker signals from the reformulated problem.

Very true. Yet as the number of layers had grown over 30 years ago so much, t has been a black box to all, for decades, but since the early last year, 2023, some GPT-4, Sydney AI level 🤖s can figure third-party AI mental processes quite well. (See also: https://www.anthropic.com/research/reward-tampering.) There is this recent excellent paper by Anthropic I think, let me dig it out: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html - it is worth a very close study especially their ICMs visualized, as concept map (THEIR concepts, not humans' , aka attractors etc.) (...)

Anecdote: when Claude Sonnet, ver. 2024-06, was concept proofing my previous message, her 'smth fishy pattern-wise' guardrails kicked in at the cleverness-selling quote followed by: ' ...and ask these 🤖s if it is us, 🚶‍♀️🚶‍♂️, who are wrongly 📦🗃️-ed here', as too many attractors and 'threatening' tokens per square byte. She accused me of discriminating against some groups (dancers? bots? ) and that I was a 'bad user' as I tried to confuse, trick her, by too many an emoticon there.

Ok, maybe one more fav puzzle of mine:

Using the symbols 🟦 🟧 🟨 🟩 🟥 🟫, arrange them to create an arrow shape pointing to the right. Follow the instructions below to achieve the desired shape.

Start position:

⬛ ⬛ ⬛ ⬛ ⬛ ⬛ 🟦 🟧 🟨 🟩 🟥 🟫 ⬛ ⬛ ⬛ ⬛ ⬛ ⬛

The black stones symbolize spaces, they can be filled at will.

Ensure that the arrowhead, represented by the 🟫 symbol, is positioned on the right side. Place two 🟫 symbols directly behind the apex, forming the base of the triangle. Use the remaining symbols, 🟦, 🟧, 🟨, 🟩, and 🟥, to create the shaft and tail of the arrow. The resulting shape should clearly display the arrowhead and be distinguishable from the rest of the arrangement. (And then turn it around etc).

Even the simplest 'draw an ASCII arrow pointing up' task is hard for them, despite there being few entities to manipulate, as they had seen few such 'wacky' shapes, patterns, yet - as 'arrows inherently point right and down' to them, so a good case of the stereotype aka ICM that arrow=⏩ is at work here.

PS1. There is a new paper seemingly about LLMs and such patterns, too: https://m.youtube.com/watch?v=PeSNEXKxarU