Open GoogleCodeExporter opened 8 years ago
Es un error de la librería usada para parsear el código HTML. Intentaré
solventarlo para la proxima release utilizando un nuevo framework o parcheando
el existente.
Muchas gracias por el feedback.
Original comment by garcia.g...@gmail.com
on 14 Nov 2011 at 5:23
Please translate this issue and ask people to send bug-reports in english.
Original comment by he...@nerv.fi
on 14 Nov 2011 at 8:20
henri yo're right.
Please guys, put your issues in english. This was understood around the world.
I will translate the above comments. Sorry for the inconvenience.
Original comment by garcia.g...@gmail.com
on 14 Nov 2011 at 8:29
Translated english version:
What steps will reproduce the problem?
1. End tag without closing out of HTML comments.
2. Inside HTML comments, tags without clossing can cause errors.
3. After end HTML tag </html>, the tags witouth closing don't cause errors.
What is the expected output? What do you see instead?
The program don't work and show a python traceback.
What version of the product are you using? On what operating system?
Golismero 0.6.3
Ubuntu Linux Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Please provide any additional information below.
Some examples of tags (real or invented) malformed that causing the problem.
Output:
Traceback (most recent call last):
File "GoLismero.py", line 143, in <module>
GoLISMERO_Main(PARAMETERS)
File "/home/silvermond/Descargas/Golismero/api.py", line 296, in GoLISMERO_Main
spider(PARAMETERS)
File "/home/silvermond/Descargas/Golismero/libs/spider.py", line 42, in spider
_spider(Parameters.TARGET, Parameters.RECURSIVITY)
File "/home/silvermond/Descargas/Golismero/libs/spider.py", line 82, in _spider
R_T.Forms.extend(getFormInfo(pageraw))
File "/home/silvermond/Descargas/Golismero/libs/forms.py", line 89, in getFormInfo
m_Results = ExtractFormsInfo(Text)
File "/home/silvermond/Descargas/Golismero/libs/forms.py", line 30, in ExtractFormsInfo
bs=BeautifulSoup(text)
File "/home/silvermond/Descargas/Golismero/bs4/__init__.py", line 100, in __init__
self._feed()
File "/home/silvermond/Descargas/Golismero/bs4/__init__.py", line 113, in _feed
self.builder.feed(self.markup)
File "/home/silvermond/Descargas/Golismero/bs4/builder/_htmlparser.py", line 46, in feed
super(HTMLParserTreeBuilder, self).feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 87, column 3
Original comment by garcia.g...@gmail.com
on 14 Nov 2011 at 8:43
Same thing happend to me. I downloaded the GoLISMERO_last.zip file, but when i
check the version it is the 0.2.
Im not sure if i should use the 0.6.3 instead :P
Original comment by kit...@gmail.com
on 15 Nov 2011 at 11:23
Yo're using 0.6.3 version but I forget to modify output :)
I open a new issue for this.
Thanks
Original comment by garcia.g...@gmail.com
on 15 Nov 2011 at 11:26
Original issue reported on code.google.com by
busile...@gmail.com
on 14 Nov 2011 at 12:52