eike-welk / clair

Collect prices on E-Commerce sites, and display them in graphical form.
GNU General Public License v3.0
0 stars 0 forks source link

Convert HTML description to Markdown or Restructured Text #11

Open eike-welk opened 11 years ago

eike-welk commented 11 years ago

Product descriptions on Ebay often consist of huge chunks of HTML. The text recognition algorithms however understand only simple text.

Currently the descriptions are stored as text, with only a minimum of preserved formatting. This is good enough for the recognition algorithms, but it looks ugly to humans.

However it would be nice to store the descriptions in a format that is everything at once: compact, beautiful for humans, useful for the algorithms. Such a format would be Restructured Text or Markdown.

-

Library for converting HTML to text in Markdown format:

https://github.com/aaronsw/html2text/blob/master/html2text.py

eike-welk commented 7 years ago

Updated description to reflect new architecture.

eike-welk commented 7 years ago

Completely rephrased the description, to reflect the current algorithm.