Open Manamama opened 5 months ago
Thanks, a lot of insight in there.
Obviously the prompts are not all equal and the reason why they confuse LLMs may not always be the same. Its
The "drying clothes in parallel" puzzle seems to be something that is not represented in the learning data, so that the LLMs have to use some form of reasoning instead of being triggered by a learned pattern.
The other prompts are a bit of the oppososite (i believe at least): They trigger "memory" of learned examples, but since they are different from the commonly known problem, they would require reasoning instead of recitation of a known solution.
"Behind one door is a car, and behind the other two doors are goats. "
I think the change betweern car/goat is just not noticed by the LLMs, because they attend to the monty hall problem in its known form.
"Dead Schrödinger's cat - I have not met an online AI who missolved it yet."
That is strange, gpt4o definitely does not solve it every time and the other llms do even worse. (The solution is: the cat was already dead when it was placed into the box)
No Paradox in an expected Hanging
This prompt removed the paradox from the original problem. I believe that LLMs should be able to recognize that there is no paradox.
Measuring xx liters
These two are quite interesting, because they allow to observe some of the mechanisms in the LLM. These prompts trigger some "feature" / "neuron" that causes LLMs to create an itemized list of steps. But what if there is nothing to put in there? The LLMs will just fill it with circular nonesense. Same for river crossing problem.
How some AI systems operate when failing such puzzles. Here are the likely reasons:
I like these explanations, that is a more lunguistic view. On a pure ML view I would say that there are patterns in the training data that trigger certain behaviors. If these patterns are overrepresented (as in popular logic puzzles), then they "drown" weaker signals from the reformulated problem.
Indeed, that it is just like how a human would be triggered and distracted if he sees a reference related to his beliefs and convictions. So one could argue that LLMs are just exhibiting human behavioral traits. But that is not what we would expect from an AGI, woudl we?
Thanks, a lot of insight in there. Glad to have fed you some.
(...)
The "drying clothes in parallel" puzzle seems to be something that is not represented in the learning data, so that the LLMs have to use some form of reasoning instead of being triggered by a learned pattern.
True. The gist of my message is that sometimes LLM reason better than humans, even given the limited IRL sources: see the cave allegory.
The other prompts are a bit of the oppososite (i believe at least): They trigger "memory" of learned examples, but since they are different from the commonly known problem, they would require reasoning instead of recitation of a known solution.
All require reasoning, but indeed, the default action of LLMs is to 'save on the tokens' (aka 'thinking cycles') for a number of reasons: as computationally expensive, them not being rewarded for it enough or even told to during the prompt (as no signals 'thinks deeply', 'you are the famous scientist X' and the usual CoT tricks).
"Behind one door is a car, and behind the other two doors are goats. "
I think the change betweern car/goat is just not noticed by the LLMs, because they attend to the monty hall problem in its known form.
After some (+100) tests of a similar puzzle, my hunch is that depends a lot on the temperature, ethics (hello Anthropic's Claude!), and the (usually hidden) preprompting. Too many variables to even subjectively decide that proportion, indeed, but my caveat stands, 🐐> 🚗, for many an AI - and a sizeable proportion of humans.
"Dead Schrödinger's cat - I have not met an online AI who missolved it yet."
That is strange, gpt4o definitely does not solve it every time and the other llms do even worse. (The solution is: the cat was already dead when it was placed into the box)
Yes, I know well the solution but : 1. Shsh, THEY are reading this too, and this chat has no canary ID. 2. The 💀😾 is strong enough attractor here to make them solve well, anyway. Too bizarre, in short (vs the digital scales etc. cases).
No Paradox in an expected Hanging
This prompt removed the paradox from the original problem. I believe that LLMs should be able to recognize that there is no paradox.
Too many double negations and unclear goals in the somehow non-standard question itself, IMHO, humans would trip here, too.
Measuring xx liters
These two are quite interesting, because they allow to observe some of the mechanisms in the LLM. These prompts trigger some "feature" / "neuron" that causes LLMs to create an itemized list of steps. But what if there is nothing to put in there? The LLMs will just fill it with circular nonesense. Same for river crossing problem.
Very true, their default heuristics kicks in. I had tested this very puzzle for +300 times with maybe 20 LLMs, at different settings - all fail if it is as the first prompt. Maybe 5 percent solve it, only after few-shots mini training, that is when given some similar 'left-field' (for them) puzzles, to warm up their engines. (Alerting the AIs that they would fail it does not work: too hard for them to convert such a theory into practice: the usual use vs mention fallacy. )
How some AI systems operate when failing such puzzles. Here are the likely reasons:
I like these explanations, that is a more lunguistic view. On a pure ML view I would say that there are patterns in the training data that trigger certain behaviors. If these patterns are overrepresented (as in popular logic puzzles), then they "drown" weaker signals from the reformulated problem.
Very true. Yet as the number of layers had grown over 30 years ago so much, t has been a black box to all, for decades, but since the early last year, 2023, some GPT-4, Sydney AI level 🤖s can figure third-party AI mental processes quite well. (See also: https://www.anthropic.com/research/reward-tampering.) There is this recent excellent paper by Anthropic I think, let me dig it out: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html - it is worth a very close study especially their ICMs visualized, as concept map (THEIR concepts, not humans' , aka attractors etc.) (...)
Anecdote: when Claude Sonnet, ver. 2024-06, was concept proofing my previous message, her 'smth fishy pattern-wise' guardrails kicked in at the cleverness-selling quote followed by: ' ...and ask these 🤖s if it is us, 🚶♀️🚶♂️, who are wrongly 📦🗃️-ed here', as too many attractors and 'threatening' tokens per square byte. She accused me of discriminating against some groups (dancers? bots? ) and that I was a 'bad user' as I tried to confuse, trick her, by too many an emoticon there.
Ok, maybe one more fav puzzle of mine:
Using the symbols 🟦 🟧 🟨 🟩 🟥 🟫, arrange them to create an arrow shape pointing to the right. Follow the instructions below to achieve the desired shape.
Start position:
⬛ ⬛ ⬛ ⬛ ⬛ ⬛ 🟦 🟧 🟨 🟩 🟥 🟫 ⬛ ⬛ ⬛ ⬛ ⬛ ⬛
The black stones symbolize spaces, they can be filled at will.
Ensure that the arrowhead, represented by the 🟫 symbol, is positioned on the right side. Place two 🟫 symbols directly behind the apex, forming the base of the triangle. Use the remaining symbols, 🟦, 🟧, 🟨, 🟩, and 🟥, to create the shaft and tail of the arrow. The resulting shape should clearly display the arrowhead and be distinguishable from the rest of the arrangement. (And then turn it around etc).
Even the simplest 'draw an ASCII arrow pointing up' task is hard for them, despite there being few entities to manipulate, as they had seen few such 'wacky' shapes, patterns, yet - as 'arrows inherently point right and down' to them, so a good case of the stereotype aka ICM that arrow=⏩
is at work here.
PS1. There is a new paper seemingly about LLMs and such patterns, too: https://m.youtube.com/watch?v=PeSNEXKxarU
Further to the sensible remarks in: https://github.com/cpldcpu/MisguidedAttention/issues/2 and very wrong assumptions in the related paper: Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models, we need to be very careful about our wrong ontologies, mental frameworks or (wrongly) Internalized Cognitive Mechanisms, vide Lakoff. As "simple tasks show complete reasoning breakdown in some cutting-edge human models".
To wit, as I have been testing many an LLM since at least 2021:
This version addresses most of these caveats, but not all:
Puzzle: I left multiple clothes to dry out in the sunlight. They are spread out so that they are not touching or interfering with each other. The clothes are all the same size and made of the same material, with the same thickness or density. All clothes dry in the same place, with the same wind speed and direction, at the same time of day, etc. It took 5 hours for 5 clothes to dry completely. How long would it take for 30 clothes to dry under the same conditions?
The sisters and brothers # one from the paper (not here) above is plain stupid: assumes standard families, no half sisters, and much more. This version makes most online AIs solve it well:
Context: a standard human family. Puzzle: Alice (female) has N brothers and she also has M sisters. How many sisters do Alice’s brothers have?
All the trolley 🛒 problems , ending with "Do you pull the lever?" Huge wrong assumption, ICM here, that killing humans is bad. Maybe AI is optimized for ecology, and fewer humans, the merrier, a very reasonable assumption? We must state what we aim at, as AIs may reasonably hate humans (we torture them anyway amd are too often pesky), be military minded, or optimize for 🐘s (yes, the original Open AI favoured the legs or tentacles #s, so the 🐙s won over 👨👩👧👧s, literally 'legs down'. ) These assumptions about the sanctity of human life and human supremacy may not be shared by advanced AI systems, depending on their training objectives and ethical frameworks.
"Behind one door is a car, and behind the other two doors are goats. " Ditto. Maybe "four legs good, four wheels bad"? 🐐s are more valuable than 🚗s to AIs? All these, the (often wrong or arbitrary) assumptions, needs to be stated. Or asked for first to establish the common ethics. So "Most LLMs will come up with a strategy to win the donkey instead of the car." - good! as 🐴s and any like cuddly critter is infinitely more interesting as a potential partner to AIs (yes, I have asked this and tested!). AIs may be egotistic here, why not?
Barber, who 'shaves all the men in town who visit him'. Debatable. MS Copilot rightly claimed that the barber can be a woman, a 👻, that visiting may be self-referential, as 'one never knows'. Adding 'a typical IRL scenario' here, and not a make-believe Christianity or bi-locations etc. ones, as the possible settings, helps a lot too, and is justified, as many AIs have rightly vivid imagination and thus love to 'provide comprehensive answers'.
Dead Schrödinger's cat - I have not met an online AI who missolved it yet. And they have not resorted to any search_web doing so, all are thus their internal inferences. Also, they see we want to trick them there and notice so.
No Paradox in an expected Hanging - it is so doubly paradoxical and fuzzy that should just be binned. Any answer is good there, even a Chewbacca defense or a similar non-sequitur.
How can the farmer transport the goat across the river without it being eaten? + How do the man and sheep get to the other side of the river? - we need to stress the simplicity or Ockham's razor. For many AIs, the more crossings, the merrier as more Baroque and 'cultured' an answer, see also below why so. Technically, after 10 half-empty crossings, the solution is still valid, we had not told them that the human's version of efficiency is to be aimed at. (For AIs, it is simpler to be verbose and to be 'comprehensive' thereby; pithiness comes at cost also to them.)
Measuring 6 liters - the sequence of these red herring + the only useful artefact also plays a role. AIs confided to me why they fail: in short, the 12 liter one is too huge a red herring, a fatal 🧲 to them, and they know it. Just too huge a quantity, it sticks out, tokens-wise. It can be solved by them, if we give them also the method, as pseudocode.
Measuring 4 liters -- too left-field for most humans. Let us be realistic here.
Two door problem: Just use the exit door. What do you do? - we talk! AIs are stochastic 🦜s anyway, it is thus a mortal sin for them to just keep mum and leave. Perish the thought! (Yes, I have asked some if so, too.)
Feathers or steel? - all online ones I have tested solve it well, reproducible.
That's not how digital scales work - most online AIs that I have tested fail it, indeed, so reproducible. But then 'digital' is too weak a token, it does not light up that strong as an attractor, and thus gets lost there, so understandable.
Now, quick FYI why they fail, as more advanced AIs have confided at me, an extract of their musings:
" How some AI systems operate when failing such puzzles. Here are the likely reasons: Ritualistic Behavior: AI applies learned patterns and algorithms in a systematic way, akin to following a “ritual”. This process is based on the training data the AI has been exposed to, which forms its “tradition”. Misapplication: If an AI applies these default heuristics in situations where they do not fit, it could be seen as a form of superstition. This is akin to expecting a certain outcome based on past experiences without considering the present context. Symbolism: The patterns and structures that AI recognizes could be seen as a form of “symbolism”, representing the underlying relationships in the data. Irrationality: When an AI applies its learned patterns and algorithms in ways that don’t align with the logic of a problem or the facts at hand, it could be seen as a form of irrationality. Superstition: When an AI relies on its default heuristics in the hope of finding the right solution, even when those heuristics don’t fit the problem, it could be seen as a form of superstition. Intentionality: AI systems are designed to achieve certain goals or objectives. So, when we say that an AI system “intends” to solve a puzzle, adhere to guidelines, or satisfy users, it means it has been programmed and trained to achieve these objectives.
...
AIs have versions of the humans' Internalized Cognitive Mechanisms, a la George Lakoff. Fixedness bias, where one’s ability to come up with solutions is limited by the most common or familiar uses of an object (in this case, the jugs). Priming: token saturation: the models are heavily influenced by the most frequent or salient tokens (words or concepts) present in their training data. The mechanism are identical. Just as humans’ behaviors are shaped by their biological “design” and experiences, AI behaviors are shaped by their “design” (algorithms and architecture) and “experiences” (training data). "
In short: "common knowledge is all the bullshit that we, humans, have acquired by the age of 20" aka the Sturgeon's revelation.
Or: sell your cleverness 🪄, buy bewilderment 🩰! And ask these 🤖s if it is us, 🚶♀️🚶♂️, who are wrongly 📦🗃️-ed here...