buckket / twtxt

Decentralised, minimalist microblogging service for hackers.
http://twtxt.readthedocs.org/en/stable/
MIT License
1.92k stars 79 forks source link

Should ‘\S+’ be ‘\w+’? #77

Closed kseistrup closed 8 years ago

kseistrup commented 8 years ago

In https://github.com/buckket/twtxt/blob/master/twtxt/mentions.py#L16 we have

short_mention_re = re.compile(r'@(?P<name>\S+)')

Shouldn't the ‘\S+’ be a ‘\w+’? It seems that if people mention several users separated by a comma, the ‘\S+’ form will swallow the comma:

import re

test = '@alpha, @beta, . . . @omega'

rx_s = re.compile(r'@(\S+)')
print(rx_s.sub('CENSORED', test))

rx_w = re.compile(r'@(\w+)')
print(rx_w.sub('CENSORED', test))

When run, the above code will show

CENSORED CENSORED . . . CENSORED
CENSORED, CENSORED, . . . CENSORED

Or will ‘\w+’ just cause another problem?

buckket commented 8 years ago

Yes, seems reasonable, not sure if it breaks other things however. Guess I have to fire up some additional tests, just to be sure.