Closed kseistrup closed 8 years ago
A quick and dirty solution could be to ignore tweets that doesn't pass the .isprintable()
test. TABs and the like are not deemed printable, but we could easily convert all whitespace to proper space chars (and collapse multiple whitespaces at the same time),. e.g.:
def collapse(text):
"""Collapse multiple whitespaces and test for printability"""
collapsed = ' '.join(text.split())
if not collapsed.isprintable():
return None
return collapsed
Caller should, of course, check return value for is None
.
My hacky solution was to call click.unstyle()
when parsing new Tweets, which then removes all escape sequences. Not sure if this sufficient, though.
@buckket thanks for reminding me of click.unstyle()
.
I have added an @evil
stream at /evil.txt
on the server where my default stream can be found. The first five lines read
2016-02-11T13:33:59+0000 This is a TWTXT file, please see <https://github.com/buckket/twtxt> for details.
2016-02-11T13:36:48+0000 WARNING: This stream may contain overly long lines, evil escape sequences, binary fluff, and other non-standard content.
2016-02-11T13:39:49+0000 The file is NOT intended to do harm, nor intended for public consumption.
2016-02-11T13:40:51+0000 Rather, it could be used by developers to test their TWTXT clients against a potentially malformed file.
2016-02-11T13:43:02+0000 *** PLEASE PROCEED AT YOUR OWN PERIL *** YOU HAVE BEEN WARNED ***
I do not want people to stumble over this file accidentally and think I'm doing this with a malicious intent, so I'm not posting the direct link. However, you should be able to find it with minimal effords — especially if you are already following me.
@kseistrup Thanks, will test against later. :)
@kseistrup I could use that file too. Where is it?
@erlehmann same server as @kas' twtxt stream, with filename /evil.txt
instead of /twtxt.txt
.
Guess this still needs some work, as click.unstyle()
does not solve all of it.
Will try the isprintable-approach.
Just a warning for those of you who are writing
twtxt
terminal clients: Please visit https://mosh.mit.edu/ and search for “Careful terminal emulation”. While we still ought to allow unicode, we should probably think about sanitizing each tweet before displaying it.~@kas