Closed tels7ar closed 5 years ago
Hey, can you give me some info about how/when it crashes?
Thanks!
I've seen it happen with multiple emails, and it's 100% reproducible - if it crashes on a certain email it always crashes on that email.
Unfortunately the one email I have right now exhibiting the behavior is company confidential.
I'm not doing anything with language or encoding and my terminal is en_US.UTF-8. I set my terminal to screen-256color because I'm using screen. This happens in both iterm2 and terminal on my mac.
Can you give me any debugging tips or is there anything I can turn on to get more info? I'm unfortunately not much of a python expert.
Unfortunately, I've had situations like this before where it's a particular character in certain emails that triggers the crash. If there's any way you can find one of the other emails that trigger it, or strip all the confidential information out and email me a tarred version (has to be tarred up, or the process or forwarding the email usually 'fixes' the problem on my end so I can't see it), that would be great, as that's the only way I'll be able to troubleshoot.
Urlscan is still a bit fragile when handling foreign characters/encodings.
Thanks, Scott
I can confirm. I also get this error today. Often happens when an email is sent in HTML format by my pager converts it to html (I am using lynx or elinks to view, error with both).
I dont think there is any foreign character in the email.
Again, if someone can tar up the offending email and send it to me, that's the only way I can troubleshoot this. You can copy the actual email file (if you're using maildir) and edit out any private information first. Just make sure before you tar it up that running urlscan on that file directly from the command line (urlscan <path/to/email>
) still causes the error.
Anyone able to send me a tarred up email that reproduces this error? I use urlscan daily with mutt and haven't seen this at all. Thanks!
Closing. If you see this again, please try to tar a sanitized email and send it to me for troubleshooting. I haven't seen the error and I can't reproduce it. Thanks!
@firecat53 Just came across this error. Attaching a somewhat sanitized email and the error itself.
Traceback (most recent call last): File "/usr/bin/urlscan", line 134, in
compact_mode=args.compact) File "/usr/lib/python3/dist-packages/urlscan/urlchoose.py", line 133, in init compact_mode) File "/usr/lib/python3/dist-packages/urlscan/urlchoose.py", line 58, in process_urls for group, usedfirst, usedlast in extractedurls: File "/usr/lib/python3/dist-packages/urlscan/urlscan.py", line 424, in msgurls for chunk in extracthtmlurls(msg): File "/usr/lib/python3/dist-packages/urlscan/urlscan.py", line 392, in extracthtmlurls c.feed(s) File "/usr/lib/python3.7/html/parser.py", line 111, in feed self.goahead(0) File "/usr/lib/python3.7/html/parser.py", line 173, in goahead k = self.parse_endtag(i) File "/usr/lib/python3.7/html/parser.py", line 421, in parse_endtag self.handle_endtag(elem) File "/usr/lib/python3/dist-packages/urlscan/urlscan.py", line 186, in handle_endtag del self.list_stack[-1] IndexError: list assignment index out of range
Thanks! I'll reopen this. Would it be possible to edit the original email to sanitize it, make sure the sanitized copy still reproduces the error, and then put that into a tar archive and send it or post it? I can't reproduce the error with the file you attached, but I'm not surprised because just the act of uploading it typically cleans the offending character. Putting it into a tar archive should preserve the error.
Thanks!
Thanks for reopening :-)
I've had a deeper look into this. The problem is not with the e-mail (I was sure of that). Ubuntu (and Debian) ship an outdated package of urlscan (0.8.2). That's the issue and that's why you can't reproduce it.
Updating to urlscan 0.9.3 (with pip) and manually installing the corresponding 0.9.3 bin from git, almost work. I do get this error message:
ImportError: cannot import name DEVNULL
Which python package should be installed to fix this? P.S.: My workaround currently is modifying urlscan.py to remove that included module.
Edit: this error only applies when python2 is being used. When forcing bin/urlscan to use python3 (instead of using env python), this is not an issue. Might be useful to force python3 usage?
I'm going to deprecate python 2 here shortly. I missed that DEVNULL
isn't available with 2.7. Glad you figured it out though!
No fixed yet.
No fixed yet.
Please provide information about urlscan version, python version, actual error and a sanitized and tarred copy of the email that caused the error.
@firecat53: Can you give a recommendation how we best use python 3? I installed it via Homebrew and in the hashbang it specifically calls python 2.7 and that crashes urlscan for me every time.
@pheuberger For consistency with my other projects, I'm going to change the script shebang to specifically call python 3. I'll put out a new release soon.
Also, please open a new issue when you have questions instead of adding to an existing one :smile:
I get this fairly frequently when running urlscan from mutt: