Product descriptions on Ebay often consist of huge chunks of HTML. The text recognition algorithms however understand only simple text.
Currently the descriptions are stored as text, with only a minimum of preserved formatting. This is good enough for the recognition algorithms, but it looks ugly to humans.
However it would be nice to store the descriptions in a format that is everything at once: compact, beautiful for humans, useful for the algorithms. Such a format would be Restructured Text or Markdown.
-
Library for converting HTML to text in Markdown format:
Product descriptions on Ebay often consist of huge chunks of HTML. The text recognition algorithms however understand only simple text.
Currently the descriptions are stored as text, with only a minimum of preserved formatting. This is good enough for the recognition algorithms, but it looks ugly to humans.
However it would be nice to store the descriptions in a format that is everything at once: compact, beautiful for humans, useful for the algorithms. Such a format would be Restructured Text or Markdown.
-
Library for converting HTML to text in Markdown format:
https://github.com/aaronsw/html2text/blob/master/html2text.py