Letractively / golismero

Automatically exported from code.google.com/p/golismero
0 stars 0 forks source link

Golismero don't work with HTML bad closing. #2

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.  Etiquetas sin cerrar fuera de comentarios HTML.
2.  Dentro de los comentarios HTML las etiquetas sin cerrar no causan error.
3.  Después de la etiqueta de cierre HTML </html> las etiquetas sin cerrar NO 
provocan error.

What is the expected output? What do you see instead?
El programa no funciona y muestra el Traceback típico de Python.

What version of the product are you using? On what operating system?

Golismero 0.6.3

Ubuntu Linux Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2

Please provide any additional information below.

Algunos ejemplos de etiquetas (Reales o inventadas) mal formadas que provocan 
el problema:

<a
<b<>

Salida:

Traceback (most recent call last):
  File "GoLismero.py", line 143, in <module>
    GoLISMERO_Main(PARAMETERS)
  File "/home/silvermond/Descargas/Golismero/api.py", line 296, in GoLISMERO_Main
    spider(PARAMETERS)
  File "/home/silvermond/Descargas/Golismero/libs/spider.py", line 42, in spider
    _spider(Parameters.TARGET, Parameters.RECURSIVITY)
  File "/home/silvermond/Descargas/Golismero/libs/spider.py", line 82, in _spider
    R_T.Forms.extend(getFormInfo(pageraw))
  File "/home/silvermond/Descargas/Golismero/libs/forms.py", line 89, in getFormInfo
    m_Results = ExtractFormsInfo(Text)
  File "/home/silvermond/Descargas/Golismero/libs/forms.py", line 30, in ExtractFormsInfo
    bs=BeautifulSoup(text)
  File "/home/silvermond/Descargas/Golismero/bs4/__init__.py", line 100, in __init__
    self._feed()
  File "/home/silvermond/Descargas/Golismero/bs4/__init__.py", line 113, in _feed
    self.builder.feed(self.markup)
  File "/home/silvermond/Descargas/Golismero/bs4/builder/_htmlparser.py", line 46, in feed
    super(HTMLParserTreeBuilder, self).feed(markup)
  File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
    endpos = self.check_for_whole_start_tag(i)
  File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
    self.error("malformed start tag")
  File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 87, column 3

Original issue reported on code.google.com by busile...@gmail.com on 14 Nov 2011 at 12:52

GoogleCodeExporter commented 8 years ago
Es un error de la librería usada para parsear el código HTML. Intentaré 
solventarlo para la proxima release utilizando un nuevo framework o parcheando 
el existente.

Muchas gracias por el feedback.

Original comment by garcia.g...@gmail.com on 14 Nov 2011 at 5:23

GoogleCodeExporter commented 8 years ago
Please translate this issue and ask people to send bug-reports in english.

Original comment by he...@nerv.fi on 14 Nov 2011 at 8:20

GoogleCodeExporter commented 8 years ago
henri yo're right. 

Please guys, put your issues in english. This was understood around the world.

I will translate the above comments. Sorry for the inconvenience.

Original comment by garcia.g...@gmail.com on 14 Nov 2011 at 8:29

GoogleCodeExporter commented 8 years ago
Translated english version:

What steps will reproduce the problem?
1.  End tag without closing out of HTML comments.
2.  Inside HTML comments, tags without clossing can cause errors.
3.  After end HTML tag </html>, the tags witouth closing don't cause errors.

What is the expected output? What do you see instead?

The program don't work and show a python traceback.

What version of the product are you using? On what operating system?

Golismero 0.6.3

Ubuntu Linux Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2

Please provide any additional information below.

Some examples of tags (real or invented) malformed that causing the problem.

Output:

Traceback (most recent call last):
  File "GoLismero.py", line 143, in <module>
    GoLISMERO_Main(PARAMETERS)
  File "/home/silvermond/Descargas/Golismero/api.py", line 296, in GoLISMERO_Main
    spider(PARAMETERS)
  File "/home/silvermond/Descargas/Golismero/libs/spider.py", line 42, in spider
    _spider(Parameters.TARGET, Parameters.RECURSIVITY)
  File "/home/silvermond/Descargas/Golismero/libs/spider.py", line 82, in _spider
    R_T.Forms.extend(getFormInfo(pageraw))
  File "/home/silvermond/Descargas/Golismero/libs/forms.py", line 89, in getFormInfo
    m_Results = ExtractFormsInfo(Text)
  File "/home/silvermond/Descargas/Golismero/libs/forms.py", line 30, in ExtractFormsInfo
    bs=BeautifulSoup(text)
  File "/home/silvermond/Descargas/Golismero/bs4/__init__.py", line 100, in __init__
    self._feed()
  File "/home/silvermond/Descargas/Golismero/bs4/__init__.py", line 113, in _feed
    self.builder.feed(self.markup)
  File "/home/silvermond/Descargas/Golismero/bs4/builder/_htmlparser.py", line 46, in feed
    super(HTMLParserTreeBuilder, self).feed(markup)
  File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
    endpos = self.check_for_whole_start_tag(i)
  File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
    self.error("malformed start tag")
  File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 87, column 3

Original comment by garcia.g...@gmail.com on 14 Nov 2011 at 8:43

GoogleCodeExporter commented 8 years ago
Same thing happend to me. I downloaded the GoLISMERO_last.zip file, but when i 
check the version it is the 0.2.

Im not sure if i should use the 0.6.3 instead :P

Original comment by kit...@gmail.com on 15 Nov 2011 at 11:23

GoogleCodeExporter commented 8 years ago
Yo're using 0.6.3 version but I forget to modify output :)

I open a new issue for this.

Thanks

Original comment by garcia.g...@gmail.com on 15 Nov 2011 at 11:26