Otto404 / fizzler

Automatically exported from code.google.com/p/fizzler
GNU General Public License v3.0
0 stars 0 forks source link

Consider SgmlReader as alternative default to HtmlAgilityPack #25

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What new or enhanced feature are you proposing?

Fizzler tools like Visual Fizzler rely on HtmlAgilityPack as the default
implementation. However, the HtmlAgilityPack project seems to have gone
stale at the moment and has a few bugs pending that are also affecting CSS
selection via Fizzler. Consider a more robust default alternative.

What goal would this enhancement help you achieve?

It would make Fizzler look less buggy. :)

Original issue reported on code.google.com by azizatif on 6 May 2009 at 3:22

GoogleCodeExporter commented 9 years ago
See issue #24 for one HtmlAgilityPack bug biting Fizzler.

Original comment by azizatif on 6 May 2009 at 3:22

GoogleCodeExporter commented 9 years ago
Two possibles:

http://developer.mindtouch.com/SgmlReader
http://code.google.com/p/twintsam/

Original comment by info%colinramsay.co.uk@gtempaccount.com on 6 May 2009 at 3:27

GoogleCodeExporter commented 9 years ago
> twintsam

The project home page says, "The code is not usable yet." That leaves just 
SgmlReader
for now.

Original comment by azizatif on 6 May 2009 at 3:34

GoogleCodeExporter commented 9 years ago
I think dropping HtmlAgilityPack (at least as the default) is a 
good idea. It isn't actively maintained and its developers don't 
seem too eager to fix bugs in it either. It is an excellent library 
for simple HTML parsing, and is afaik the only one exposing a full 
DOM (which is very convenient), but because of its bugs and 
inactivity, I think it's a wise plan to move away from it.

SgmlReader and Twintsam are both alternatives worth looking into. I know 
Thomas Broyer, the project owner of Twintsam, and it is a very promising 
project with the goal of being the reference implementation of the HTML5 
parsing algorithm in C#. That's a noble goal, imho.

SgmlReader, on the other hand, is a nice, but old and a bit dated 
implementation. I believe, though, that SgmlReader is the best of 
the three at the moment, but the code quality of the project is in 
my humble opinion not too great, which is why I don't consider 
contributing to it. I also don't think there's much testing to 
speak of in the SgmlReader project, although it is being actively 
maintained and bugs are fixed.

I would love to cooperate in implementing either of these (or others, if 
there are any) alternatives. For the long term, I think Twintsam might be 
the best project to bet on, but it does indeed need some work before it's 
production ready, so it might be something worth investigating for version 
2.0 of Fizzler.

Original comment by asbjornu on 6 May 2009 at 7:11

GoogleCodeExporter commented 9 years ago
@asbjornu: That for your feedback on the various alternatives.

> It isn't actively maintained and its developers don't 
> seem too eager to fix bugs in it either.

Wonder if it's time to fork?

> would love to cooperate in implementing either of these

Great! I've changed the summary of this issue so now it points to specifically 
to
SgmlReader and you can initially submit your contribution as a patch. If you 
need
assistance with understanding any bits of Fizzler, let us know!

We can open another issue for Twintsam when it makes sense.

Original comment by azizatif on 8 May 2009 at 12:13

GoogleCodeExporter commented 9 years ago
New Fizzler.Systems.XmlNodeQuery in r193 will support use of SgmlReader. All 
tests
pass, including an extra one to test "form input" CSS selector which was the 
root
reason for starting this issue.

Original comment by info%colinramsay.co.uk@gtempaccount.com on 11 May 2009 at 11:46