RonenNess / html_validator

Offline HTML validator for Python, based on the standard v.Nu
MIT License
6 stars 5 forks source link

"invalid literal for int() with base 10" on Python 2.7.12 #5

Open matkoniecz opened 6 years ago

matkoniecz commented 6 years ago

I tried

from html_validator import validate

errors = validate("index.html")
for err in errors:
    print("Type: %s, File: %s, Line: %d, Description: %s" % (err.type, err.file, err.line, err.description)) 

on

<html>
<head>
</head>
<body></body>
</html>

and it falied with

Traceback (most recent call last):
  File "validator.python2.py", line 5, in <module>
    errors = validate("index.html")
  File "/usr/local/lib/python2.7/dist-packages/html_validator/html_validator.py", line 160, in validate
    ret.append(ValidationError(i))
  File "/usr/local/lib/python2.7/dist-packages/html_validator/html_validator.py", line 62, in __init__
    self.line = int(err_wo_filename.split('.')[0].strip())
ValueError: invalid literal for int() with base 10: ''

I tried it also on a bigger html file and it failed with the same error.

matkoniecz commented 6 years ago

uname -a Linux grisznak 4.15.0-24-generic #26~16.04.1-Ubuntu SMP Fri Jun 15 14:35:08 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

RonenNess commented 6 years ago

Its possible its not working on Linux, please try to run with verbose=True (its a param in validate) and it will print the command itself. Try to execute it and see what you get in output..

Thanks!

AndrewNelis commented 5 years ago

Hi,

I saw this error also. Here is the output with verbose=True:

validate /tmp/tmpKY7lKb
Execute command: java -Xss512k -jar /home/user/.virtualenvs/weewx/lib/python2.7/site-packages/html_validator/vnu.jar /tmp/tmpKY7lKb
Command output: ('', 'Usage: java [-options] class [args...]\n           (to execute a class)\n   or  java [-options] -jar jarfile [args...]\n           (to execute a jar file)\nwhere options include:\n    -d32\t  use a 32-bit data model if available\n    -d64\t  use a 64...
Parse output line: Usage: java [-options] class [args...]
Traceback (most recent call last):
  File "./test.py", line 47, in <module>
    test_template_generation()
  File "./test.py", line 40, in test_template_generation
    errors = validate(html_output.name, verbose=True)
  File "/home/user/.virtualenvs/weewx/lib/python2.7/site-packages/html_validator/html_validator.py", line 160, in validate
    ret.append(ValidationError(i))
  File "/home/user/.virtualenvs/weewx/lib/python2.7/site-packages/html_validator/html_validator.py", line 62, in __init__
    self.line = int(err_wo_filename.split('.')[0].strip())
ValueError: invalid literal for int() with base 10: ''

I changed shell=False in the call to subprocess.Popen() in html_validator.py and it worked.

I'm not sure why this works - the command line seemed correct.

Unrelated: I also removed the -Xss512k option:

The checker requires a java thread stack size of at least 512k.
Consider invoking java with the -Xss option. For example:

  java -Xss512k -jar ~/vnu.jar FILE.html

Not sure what the default was, but it worked without specifying -Xss

java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
chaintuts commented 5 years ago

Hey Ronen,

I had the same issue and the changes suggested by @AndrewNelis worked.

For what it's worth, you most likely want shell=False anyway. Setting shell to True is a known security hazard. From the Python docs:

Warning Executing shell commands that incorporate unsanitized input from an untrusted source makes a program vulnerable to shell injection, a serious security flaw which can result in arbitrary command execution. For this reason, the use of shell=True is strongly discouraged in cases where the command string is constructed from external input

https://docs.python.org/2/library/subprocess.html#frequently-used-arguments

Hope that helps. Regards, Josh