Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.76k stars 270 forks source link

Non-int margin value in html style attributes #330

Open keshavbohra opened 4 years ago

keshavbohra commented 4 years ago

Hi,

I got a value error(ValueError: invalid literal for int() with base 10: 'au') while reading style string for a html tag. The attribute was margin-left and its value was auto instead of an integer/float.

Traceback (most recent call last):
  File "C:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\minton\dashboard\views.py", line 98, in post
    pages_count, html_files = comparePDFHTMLV2(INPUT_FILES)
  File "C:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\minton\dashboard\file_compare\pdfhtmlcompare.py", line 306, in comparePDFHTMLV2    
    html_parser.parseV2(htmlPath)
  File "C:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\minton\dashboard\file_compare\parser.py", line 78, in parseV2
    htmlModel = structure_model(htmlPath)
  File "C:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\minton\dashboard\file_compare\html_model_creation.py", line 342, in structure_model    text_tagged = tagContent(content, Flag=True)
  File "C:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\minton\dashboard\file_compare\html_model_creation.py", line 100, in tagContent     
    return H2T.handle(content)
  File "c:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\virtual-env\lib\site-packages\html2text\__init__.py", line 142, in handle
    self.feed(data)
  File "c:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\virtual-env\lib\site-packages\html2text\__init__.py", line 139, in feed
    super().feed(data)
  File "C:\Users\keshav.bohra\AppData\Local\Programs\Python\Python37\lib\html\parser.py", line 111, in feed
    self.goahead(0)
  File "C:\Users\keshav.bohra\AppData\Local\Programs\Python\Python37\lib\html\parser.py", line 171, in goahead
    k = self.parse_starttag(i)
  File "C:\Users\keshav.bohra\AppData\Local\Programs\Python\Python37\lib\html\parser.py", line 345, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "c:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\virtual-env\lib\site-packages\html2text\__init__.py", line 191, in handle_starttag 
    self.handle_tag(tag, dict(attrs), start=True)
  File "c:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\virtual-env\lib\site-packages\html2text\__init__.py", line 598, in handle_tag      
    nest_count = self.google_nest_count(tag_style)
  File "c:\Users\keshav.bohra\PycharmProjects\git\content-comparison-advisor\virtual-env\lib\site-packages\html2text\__init__.py", line 877, in google_nest_count
    nest_count = int(style["margin-left"][:-2]) // self.google_list_indent
ValueError: invalid literal for int() with base 10: 'au'