AllenDowney / ThinkPython2

LaTeX source and supporting code for Think Python, 2nd edition, by Allen Downey.
Other
2.49k stars 1.65k forks source link

Bug in has_duplicates.py #102

Open bigabdoul opened 1 year ago

bigabdoul commented 1 year ago

The two implementations of has_duplicates (https://github.com/AllenDowney/ThinkPython2/blob/master/code/has_duplicates.py) contain a bug when called with specific arguments. Since dictionary keys must be immutable (hashable types), if we pass a nested list as an argument, both functions will crash:

t = [1, 2, 3, [4, 5]]
print(has_duplicates(t))

The same is true for a set. To avoid this, you should convert the key to an immutable type (i.e., a string).

def has_duplicates(t):
    """Checks whether any element appears more than once in a sequence.
    Simple version using a for loop.
    t: sequence
    """
    d = {}
    for x in t:
        if str(x) in d:
            return True
        d[str(x)] = True
    return False

I don't even know if there is a fix for the second version has_duplicates2.

AllenDowney commented 1 year ago

Good point. As you said, this function will only work if the items in the sequence are hashable. Converting them to strings is not a bad idea, but it introduces the unexpected behavior that the integer 1 and the string '1' would be considered duplicates. So it might be best to change the specification of the function to indicate that it is only expected to work if the items are hashable.