[BUG] is_anagram does not ignore punctuation or special characters and is inefficient

The is_anagram snippet does not ignore punctuation or special characters. Furthermore, the running time could be improved using a collections.Counter.

Expected Snippet Behavior

is_anagram("#anagram", "Nag a ram!") should be True and run in linear time

Current Snippet Behavior

is_anagram("#anagram", "Nag a ram!") is False.

In cases where it does not fail it runs in linearithmic time (because of sorting), instead of linear time.

Possible Solution

One may improve the code as follows:

from collections import Counter

def is_anagram(s1, s2):
  return Counter(
    c.lower() for c in s1 if c.isalnum()
  ) == Counter(
    c.lower() for c in s2 if c.isalnum()
  )

The code counts the alphanumeric characters on each string in their lowercase version and checks if the counters match. There is no need for sorting and the intermediate strings need not be created.

I intend to open a PR for fixing this but I am unsure as to what counts as a special character. In the suggested fix, str.isalnum considers characters like U+2460 '①' to be alphanumeric and thus not a special character. Is that the correct interpretation of "special character"?

Chalarangelo / 30-seconds-of-python