Open NikitaMartynov opened 8 years ago
Beautiful Soup seems to be fitting only in cases where one has valid html pages Its success heavily depends on HTML tags which are not present in eml files therefore locating URLs with soup framework in eml files does not seem to be a good approach if the current eml_parser is not good enoughwe might need to try https://github.com/imranghory/urlextractor as an alternative
The bug is in eml_parser module and not fixed here so far.
If a url was composed in a very complex way. the eml_parser gets lost in the brackets so far observed on [ ]. So far obeserved that it produces a redundant url.