firecat53 / urlscan

Mutt and terminal url selector (similar to urlview)
GNU General Public License v2.0
213 stars 37 forks source link

html email parser #64

Closed oimaasi closed 6 years ago

oimaasi commented 6 years ago

I am aware of the current limitation about html mails as stated in Known bugs and limitations. My question is: Are there any existing, better html email parsers which can be integrated into the work flow of mutt + urlscan?

I like the idea of urlscan very much, but it doesn't work at all for the mails I want to work with. It will extract every possible urls except the ones in the main text ...

However, I can read the mail nicely in mutt, and if I copy the text I see in mutt and pipe it to urlscan, it does actually work. So apparently, mutt can correctly parse the mail and I only need to pipe the result of the mutt parser to urlscan instead of the original html one, but how? Do you have any suggestions?

firecat53 commented 6 years ago

I have looked for other parsers, but I was never ever able to find one that fit this use case particularly well. I haven't looked recently, however so if you have any suggestions, I'm definitely open to looking at options.

If you have a particular email example of what you describe as not working, the best way for me to troubleshoot is if you can tar the original email into an archive and send me the archive. Just forwarding an example email typically loses information and I'm unable to troubleshoot it effectively. Hopefully you have an available example you can share! Feel free to send it to me privately...I won't post it here.

Thanks, Scott

oimaasi commented 6 years ago

I can't speak for other cases since this is an inherently open situation. But at least the w3m html parser works very well in my case.

I have the following lines in ~/.muttrc

# urlscan
macro index,pager \cb "<pipe-message> w3m -I "utf-8" -T text/html | urlscan<Enter>"
macro attach,compose \cb "<pipe-entry> w3m -I "utf-8" -T text/html | urlscan<Enter>"