Closed gpiancastelli closed 2 years ago
Hey, thanks for the bug report!
I grepped for answer/guess words that match this format, to feed as examples. Found this:
uncut
(repeating u
)zebus
(only one u
, at second place)This seems to match your definition. I've created these as a test case in a branch of the repo (not using the fancy literate programming tools), but I can't seem to reproduce your report of score 🟨__⬜_
, instead I do get this: ⬜__🟩_
?
Curious to investigate this further!
Oh no, I misunderstood your question, you have the reverse of mine (I swapped answer and guess). I'll pick words again, sorry!
You are right indeed! See proper commit, using:
Guess: xenon
(two n
)
Answer: train
(one n
, at the end)
Score should be ⬜⬜⬜⬜🟩
, actually is ⬜⬜🟨⬜⬜
!
Bug confirmed! Thank you! I'll see what I can do (any ideas?)
Okay, I've got the draft of a solution which passes tests again in #2.
I don't like the look of the fix (changing response to a dict[int, str]
is ugly) but this is a start. I'll revisit later.
@gpiancastelli I'd be happy to hear your feedback on this fix.
Of course, once the fix is ready on the code side, I'll have to make an update to the actual "story" that explains our change, and I'll be sure to credit this finding to you.
I wrote a possible solution yesterday but I need to see if I can change one or two things that I don't like. It's both similar (two passes, still uses a Counter
) and different (the Counter
is not used for the entire answer
) from yours. Hope to post something in the next few hours.
So here's my take:
def score_guess(guess, answer):
result = [CharacterScore.NO] * len(answer)
rem_answer = Counter() # unmatched letters in answer
rem_guess = [] # indexes of unmatched letters in guess
# check for exact matches
for i, (a, g) in enumerate(zip(answer, guess)):
if a == g:
result[i] = CharacterScore.OK
else:
rem_answer[a] += 1
rem_guess.append(i)
# check for letters in other places
for i in rem_guess:
letter = guess[i]
if letter in rem_answer:
rem_answer[letter] -= 1
result[i] = CharacterScore.WRONG_PLACE
if rem_answer[letter] == 0:
del rem_answer[letter]
return ''.join(result)
Unfortunately, I haven't been able to imagine a different algorithm that would eliminate the need for two loops. But the result is a plain list. The Counter
is needed only for the letters remaining after the check for exact matches.
I have worked off-project (i.e. in a separate script file) so of course please double check that all your tests are passing. Sorry about that, but I believe I haven't touched Python seriously in the last year and a half.
Ooh, thanks.
As mentioned above, I wanted to make another pass to remove that dict[int, str]
for response, and was idly wondering how I could use a list[str]
, maybe even defaulting to CharacterScore.NO
, and you showcased that very elegantly!
I've run your code as-is in the project, and it does absolutely pass all tests! Separately, I'm curious if you want to try the project's structure.
Interesting approach, iterating on only the unmatched letters! It means another datastructure or two, but it gives a more specific iteration, that's cool.
I was trying to remove my uses of del
in existing code, as it is an unusual part of Python I don't want to encourage, and it felt less pretty than the rest. I also may not want to go for the different datastructure, keeping it simpler, at the cost of skipping over NO
case explicitly, which feels like acceptable tradeoff for less datastructures to juggle.
Thanks to your code as inspiration, I now have the following, which feels like a nice balance of short + elegant, while keeping in the spirit of the existing (if buggy) code:
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess with Counter"""
# Counter("abbey") = Counter({'b': 2, 'a': 1, 'e': 1, 'y': 1})
answer_chars = Counter(answer)
# NO is the default score, no need to detect it explicitly
response: list[str] = [CharacterScore.NO] * len(answer)
# First pass to detect perfect scores
for char_index, (answer_char, guess_char) in enumerate(zip(guess, answer)):
if answer_char == guess_char:
response[char_index] = CharacterScore.OK
answer_chars[guess_char] -= 1
# Second pass for the yellows
for char_num, (guess_char, existing_score) in enumerate(zip(guess, response)):
if existing_score == CharacterScore.OK:
continue # It's already green: skip
if guess_char in answer_chars and answer_chars[guess_char] > 0:
response[char_num] = CharacterScore.WRONG_PLACE
# Reduce occurence counter since we "used" this occurence
answer_chars[guess_char] -= 1
return "".join(response)
Passes all tests, and is much nicer than my previous code. Will let it simmer for the rest of the weekend, and then I'll look into writing an addendum to the narrative, explaining the bug, the fix, and reflecting on the reason the bug wasn't caught during original development.
Counter
is both a blessing and a curse. It does everything we need, but in a weird way which does not exactly correspond to an ideal bag/multiset. If it wasn't there, we might have defined a data structure with a "better" interface and implementation.
I like the idea of removing del
, but just because it really feels unnecessary. You could have switched to calling pop
, but decided to change the way you check the presence of guess_char
inside answer_chars
, which IMHO is more in line with how Counter
works.
I still like the additional data structures I introduced, both because they are specific to the second iteration (as you said) and because in the case of the winning guess they are empty and you never enter that iteration. But maybe your version is more streamlined, I can't really say. I think a middle-ground version could be written, where the Counter
is used like I did, but there is no additional list and the second loop runs on the (possibly partial) result as you did. But, oh well, to each their own I guess.
Thanks for the conversation. I will gladly read your reflections on the bug once you update the document.
Just merged PR #2 fixing this bug, and published the associated addendum on the narrative, which I released as v1.1.0 tag. I hope you enjoy the few extra words that your report allowed me to write.
Thanks again for taking the time to report such a flagrant error, and for discussing solutions, that was very helpful.
Have a nice day, Jb
Hi again, thanks for notifying me!
Just a further heads-up. You devoted a paragraph to the use of get
in the following line:
if answer_chars.get(guess_char, 0) > 0:
to catch both the “key is not set” as well as “key is set to zero” cases in the same way.
However, answer_chars
is not a simple dict
; it's a Counter
. And the documentation says that "Counter objects have a dictionary interface except that they return a zero count for missing items." So your line could just be written as:
if answer_chars[guess_char] > 0:
If I may, I would suggest to use Counter
by its proper protocol, make the reader notice it, and enrich the discussion by citing the existence of get
for "raw" dict
objects.
Just a further heads-up. [...]
Fantastic point, that makes my life easier (I was mildly annoyed at having to make the code a tiny bit less pretty for non-experts, this fixes it).
Created as https://github.com/OverkillGuy/literate-wordle/pull/3, will merge within the week after I let it cool down for an editing pass. Feel free to comment on that as we iterate.
I expect this to be a patch release, updating the changelog but not a whole new section.
Merged and deployed as v1.1.1!
Thanks again for your help.
There's a bug in your
score_guess
function. If the guess contains two copies of a letter, and that letter is present only once in the answer, and the second copy in the guess matches that letter in the answer, the first copy will be marked asWRONG_PLACE
, while the second copy will be marked asNO
.I'm not an English native speaker, so I can't come up with a real example just right now. But let's say we have
'A__A_'
as our guess and'___A_'
as the answer (let's just ignore letters marked as_
, which we presume are all different fromA
). Yourscore_guess
function will return'🟨__⬜_'
instead of'⬜__🟩_'
.